**SQL window functions is considered a ‘hard’ concept in SQL. This set of exercises is designed to challenge your SQL muscle and help internalize data wrangling using window functions in SQL. The questions are of three levels of difficulty: Easy, Intermediate and Hard.**

If you haven’t read the tutorial yet, read **SQL Window Functions – Made Simple and Intuitive**.

Try solving hands-on in MySQL using the link provided in each question.

*Author’s note: The solutions provided is (mostly) one of the several possible solutions. The goal is not to conform with the provided solution. Try reach the desired output shown.*

## Q1. Find the running cumulative total demand.

Difficulty Level: Intermediate

**Task:**

From the `demand2`

table, find the cumulative total sum for `qty`

.

**Input:**

```
SELECT * FROM demand2;
```

day | qty |
---|---|

1 | 10 |

2 | 6 |

3 | 21 |

4 | 9 |

6 | 12 |

7 | 18 |

8 | 3 |

9 | 6 |

10 | 23 |

**Desired Output:**

day | qty | cumQty |
---|---|---|

1 | 10 | 10 |

2 | 6 | 16 |

3 | 21 | 37 |

4 | 9 | 46 |

6 | 12 | 58 |

7 | 18 | 76 |

8 | 3 | 79 |

9 | 6 | 85 |

10 | 23 | 108 |

Solve Hands-On: HERE, Table Schema and data: Gist

Show SolutionSELECT day, qty, SUM(qty) OVER(ORDER BY day) as cumQty FROM demand2;

## Q2. Find the running cumulative total demand by product.

Difficulty Level: Intermediate

**Task**:

From the `demand`

table, find the cumulative total sum for `qty`

for each `product`

category.

**Input:**

product | day | qty |
---|---|---|

A | 1 | 10 |

A | 2 | 6 |

A | 3 | 21 |

A | 4 | 9 |

A | 5 | 19 |

B | 1 | 12 |

B | 2 | 18 |

B | 3 | 3 |

B | 4 | 6 |

B | 5 | 23 |

**Desired Solution:**

product | day | qty | CUMSUM |
---|---|---|---|

A | 1 | 10 | 10 |

A | 2 | 6 | 16 |

A | 3 | 21 | 37 |

A | 4 | 9 | 46 |

A | 5 | 19 | 65 |

B | 1 | 12 | 12 |

B | 2 | 18 | 30 |

B | 3 | 3 | 33 |

B | 4 | 6 | 39 |

B | 5 | 23 | 62 |

Solve Hands-On: HERE, Table Schema and data: Gist

Show SolutionSELECT product, day, qty, SUM(qty) OVER(PARTITION BY product ORDER BY day) as CUMSUM FROM demand;

## Q3. When are the top 2 worst performing days for each product?

Difficulty Level: Intermediate

**Task**:

Extract the two worst performing days of each product in terms of number of `qty`

sold. Paraphrasing it: Get the days corresponding to the two minimum most values of `qty`

for each product.

**Input:**

product | day | qty |
---|---|---|

A | 1 | 10 |

A | 2 | 6 |

A | 3 | 21 |

A | 4 | 9 |

A | 5 | 19 |

B | 1 | 12 |

B | 2 | 18 |

B | 3 | 3 |

B | 4 | 6 |

B | 5 | 23 |

**Desired Solution:**

product | day | qty | RN |
---|---|---|---|

A | 2 | 6 | 1 |

A | 4 | 9 | 2 |

B | 3 | 3 | 1 |

B | 4 | 6 | 2 |

Solve Hands-On: HERE, Table Schema: Gist

Show Solutionwith tbl as (SELECT PRODUCT, qty, day, row_number() over (partition by product order by qty) as RN FROM demand) select product, day, qty from tbl a where RN in (1,2)

## Q4. Find the percentage increase in `qty`

compared to the previous day.

Difficulty Level: Intermediate

**Task:**

Sort the table by `qty`

for each product and compute the percentage increase (or decrease) compared to the previous day.

**Input:**

product | day | qty |
---|---|---|

A | 1 | 10 |

A | 2 | 6 |

A | 3 | 21 |

A | 4 | 9 |

A | 5 | 19 |

B | 1 | 12 |

B | 2 | 18 |

B | 3 | 3 |

B | 4 | 6 |

B | 5 | 23 |

**Desired Solution:**

product | day | qty | qty_lag | perc_increase |
---|---|---|---|---|

A | 2 | 6 | 10 | -40 |

A | 3 | 21 | 6 | 250 |

A | 4 | 9 | 21 | -57.14 |

A | 5 | 19 | 9 | 111.11 |

B | 2 | 18 | 12 | 50 |

B | 3 | 3 | 18 | -83.33 |

B | 4 | 6 | 3 | 100 |

B | 5 | 23 | 6 | 283.33 |

Solve Hands-On: HERE, Table Schema: Gist

Show Solutionwith tbl as (select product, day, qty, lag(qty, 1) over (partition by product order by day) as qty_lag from demand) select *, round(((qty - qty_lag)/qty_lag) * 100, 2) as perc_increase from tbl where qty_lag is not null;

## Q5. Show the minimum and maximum ‘qty’ sold for each product as separate columns.

**Difficulty Level:** Easy

**Task:**

Create two new columns in the table that shows the minimum and the maximum quantity sold for each product.

**Input:**

product | day | qty |
---|---|---|

A | 1 | 10 |

A | 2 | 6 |

A | 3 | 21 |

A | 4 | 9 |

A | 5 | 19 |

B | 1 | 12 |

B | 2 | 18 |

B | 3 | 3 |

B | 4 | 6 |

B | 5 | 23 |

**Desired Solution:**

product | day | qty | min_qty | max_qty |
---|---|---|---|---|

A | 1 | 10 | 6 | 21 |

A | 2 | 6 | 6 | 21 |

A | 3 | 21 | 6 | 21 |

A | 4 | 9 | 6 | 21 |

A | 5 | 19 | 6 | 21 |

B | 1 | 12 | 3 | 23 |

B | 2 | 18 | 3 | 23 |

B | 3 | 3 | 3 | 23 |

B | 4 | 6 | 3 | 23 |

B | 5 | 23 | 3 | 23 |

Solve Hands-On: HERE, Table Schema: Gist

Show Solutionselect *, min(qty) over(partition by product) as min_qty, max(qty) over(partition by product) as max_qty from demand;

## Q6. Calculate the difference between the second largest and the second smallest sales qty

Difficulty Level: Hard

**Task:**

Calculate the diffence between the second largest and the second smallest sales `qty`

for each product.

**Input:**

product | day | qty |
---|---|---|

A | 1 | 10 |

A | 2 | 6 |

A | 3 | 21 |

A | 4 | 9 |

A | 5 | 19 |

B | 1 | 12 |

B | 2 | 18 |

B | 3 | 3 |

B | 4 | 6 |

B | 5 | 23 |

**Desired Output:**

product | day | qty |
---|---|---|

A | 4 | 9 |

A | 5 | 19 |

B | 4 | 6 |

B | 2 | 18 |

Solve Hands-On: HERE, Table Schema: Gist

Show Solutionselect product, day, qty from (select *, row_number() over (partition by product order by qty) as rownum, count(*) over (partition by product) as total_recs from demand) a where (rownum = 2) or rownum = (total_recs - 1);

## Q7. On each day, which product had the highest sales?

Difficulty Level: Intermediate

**Task:**

Create a table to show the `day`

and the names of the product the the highest `qty`

sale.

**Input:**

product | day | qty |
---|---|---|

A | 1 | 10 |

A | 2 | 6 |

A | 3 | 21 |

A | 4 | 9 |

A | 5 | 19 |

B | 1 | 12 |

B | 2 | 18 |

B | 3 | 3 |

B | 4 | 6 |

B | 5 | 23 |

**Desired Output:**

day | product | qty | maxqty |
---|---|---|---|

1 | B | 12 | 12 |

2 | B | 18 | 18 |

3 | A | 21 | 21 |

4 | A | 9 | 9 |

5 | B | 23 | 23 |

Solve Hands-On: HERE, Table Schema: Gist

Show Solutionwith tbl as ( select day, product, qty, max(qty) over (partition by day) as maxqty from demand) select day, product from tbl where qty = maxqty;

*Questions numbers Q8 to Q15 uses the same table as below.*

## Q8. Create row numbers in increasing order of sales.

**Difficulty Level:** Intermediate

**Task:**

Create row numbers in increasing order of `sales`

, starting with 1 in each location

**Input:**

product | location | sales | refunds |
---|---|---|---|

A | S1 | 9 | 2 |

B | S1 | 8 | 1 |

B | S1 | 7 | 3 |

B | S2 | 51 | 8 |

C | S2 | 33 | 10 |

D | S2 | 12 | 3 |

A | S2 | 10 | 2 |

B | S2 | 152 | 17 |

E | S3 | 101 | 23 |

C | S3 | 73 | 12 |

F | S3 | 63 | 19 |

F | S3 | 23 | 4 |

D | S3 | 5 | 0 |

A | S3 | 5 | 0 |

**Desired Output:**

product | location | sales | refunds | rn |
---|---|---|---|---|

B | S1 | 7 | 3 | 1 |

B | S1 | 8 | 1 | 2 |

A | S1 | 9 | 2 | 3 |

A | S2 | 10 | 2 | 1 |

D | S2 | 12 | 3 | 2 |

C | S2 | 33 | 10 | 3 |

B | S2 | 51 | 8 | 4 |

B | S2 | 152 | 17 | 5 |

D | S3 | 5 | 0 | 1 |

A | S3 | 5 | 0 | 2 |

F | S3 | 23 | 4 | 3 |

F | S3 | 63 | 19 | 4 |

C | S3 | 73 | 12 | 5 |

E | S3 | 101 | 23 | 6 |

Solve Hands-On: HERE, Table Schema: Gist

Show Solutionselect *, row_number() over(partition by location order by sales) as rn from demand;

## Q9. Find the top products (Rank 1 and 2) within each location

Difficulty: Intermediate

**Task:**

Extract the names of top selling products (rank 1 and 2) in each location. That is, the products with highest and second highest sales in each region.

**Input:**

product | location | sales | refunds |
---|---|---|---|

A | S1 | 9 | 2 |

B | S1 | 8 | 1 |

B | S1 | 7 | 3 |

B | S2 | 51 | 8 |

C | S2 | 33 | 10 |

D | S2 | 12 | 3 |

A | S2 | 10 | 2 |

B | S2 | 152 | 17 |

E | S3 | 101 | 23 |

C | S3 | 73 | 12 |

F | S3 | 63 | 19 |

F | S3 | 23 | 4 |

D | S3 | 5 | 0 |

A | S3 | 5 | 0 |

**Desired Output:**

location | product | total_sales | rn |
---|---|---|---|

S1 | B | 15 | 1 |

S1 | A | 9 | 2 |

S2 | B | 203 | 1 |

S2 | C | 33 | 2 |

S3 | E | 101 | 1 |

S3 | F | 86 | 2 |

Solve Hands-On: HERE, Table Schema: Gist

Show Solutiondrop table if exists temp; create table temp as with tbl as (select location, product, sum(sales) as total_sales from demand group by location, product order by location, total_sales desc) select *, row_number() over (partition by location order by total_sales desc) as rn from tbl; select * from temp where rn <= 2;

## Q10. What is the total sales from Top 3 products in each location?

Difficulty: Intermediate

**Task:**

Calculate the total sales from the top 3 best selling products in each location.

**Input:**

product | location | sales | refunds |
---|---|---|---|

A | S1 | 9 | 2 |

B | S1 | 8 | 1 |

B | S1 | 7 | 3 |

B | S2 | 51 | 8 |

C | S2 | 33 | 10 |

D | S2 | 12 | 3 |

A | S2 | 10 | 2 |

B | S2 | 152 | 17 |

E | S3 | 101 | 23 |

C | S3 | 73 | 12 |

F | S3 | 63 | 19 |

F | S3 | 23 | 4 |

D | S3 | 5 | 0 |

A | S3 | 5 | 0 |

**Desired Output:**

location | product | total_sales | rn |
---|---|---|---|

S1 | B | 15 | 1 |

S1 | A | 9 | 2 |

S2 | B | 203 | 1 |

S2 | C | 33 | 2 |

S2 | D | 12 | 3 |

S3 | E | 101 | 1 |

S3 | F | 86 | 2 |

S3 | C | 73 | 3 |

Solve Hands-On: HERE, Table Schema and data: Gist

Show Solutionselect location, product, total_sales, rn from (select *, row_number() over(partition by location order by total_sales desc) as rn from (select location, product, sum(sales) as total_sales from demand group by location, product order by location, total_sales desc) ) where rn <= 3;

## Q11. Calculate the proportion of sales from each location

Difficulty: Intermediate

**Task:**

Calculate the proportion of sales from each location and show this against each record.

**Input:**

product | location | sales | refunds |
---|---|---|---|

A | S1 | 9 | 2 |

B | S1 | 8 | 1 |

B | S1 | 7 | 3 |

B | S2 | 51 | 8 |

C | S2 | 33 | 10 |

D | S2 | 12 | 3 |

A | S2 | 10 | 2 |

B | S2 | 152 | 17 |

E | S3 | 101 | 23 |

C | S3 | 73 | 12 |

F | S3 | 63 | 19 |

F | S3 | 23 | 4 |

D | S3 | 5 | 0 |

A | S3 | 5 | 0 |

**Desired Output:**

product | location | sales | refunds | total_location | total_sales | ratio |
---|---|---|---|---|---|---|

A | S1 | 9 | 2 | 24 | 552 | 0.0435 |

B | S1 | 8 | 1 | 24 | 552 | 0.0435 |

B | S1 | 7 | 3 | 24 | 552 | 0.0435 |

B | S2 | 51 | 8 | 258 | 552 | 0.4674 |

C | S2 | 33 | 10 | 258 | 552 | 0.4674 |

D | S2 | 12 | 3 | 258 | 552 | 0.4674 |

A | S2 | 10 | 2 | 258 | 552 | 0.4674 |

B | S2 | 152 | 17 | 258 | 552 | 0.4674 |

E | S3 | 101 | 23 | 270 | 552 | 0.4891 |

C | S3 | 73 | 12 | 270 | 552 | 0.4891 |

F | S3 | 63 | 19 | 270 | 552 | 0.4891 |

F | S3 | 23 | 4 | 270 | 552 | 0.4891 |

D | S3 | 5 | 0 | 270 | 552 | 0.4891 |

A | S3 | 5 | 0 | 270 | 552 | 0.4891 |

SELECT *, total_location, round(cast(total_location as float) / cast(total_sales as float), 4) as ratio from ( SELECT *, sum(sales) over(partition by location rows between unbounded preceding and unbounded following) as total_location, sum(sales) over(rows between unbounded preceding and unbounded following) as total_sales FROM demand );

## Q12. Which products contribute to top 80% of total sales?

Difficulty: Hard

**Task:**

Calculate and extract the best selling products whose total sales does not exceed 80% of overall sales. Show the cumulative percentage contribution to sales as a percentage as well.

**Input:**

product | location | sales | refunds |
---|---|---|---|

A | S1 | 9 | 2 |

B | S1 | 8 | 1 |

B | S1 | 7 | 3 |

B | S2 | 51 | 8 |

C | S2 | 33 | 10 |

D | S2 | 12 | 3 |

A | S2 | 10 | 2 |

B | S2 | 152 | 17 |

E | S3 | 101 | 23 |

C | S3 | 73 | 12 |

F | S3 | 63 | 19 |

F | S3 | 23 | 4 |

D | S3 | 5 | 0 |

A | S3 | 5 | 0 |

**Desired Output:**

product | rn | total_product_sales | cum_total_sales | total_sales | cum_perc_sales |
---|---|---|---|---|---|

D | 1 | 17 | 17 | 552 | 0.03 |

A | 2 | 24 | 41 | 552 | 0.074 |

F | 3 | 86 | 127 | 552 | 0.23 |

E | 4 | 101 | 228 | 552 | 0.41 |

C | 5 | 106 | 334 | 552 | 0.60 |

Solve Hands-On: HERE, Table Schema: Gist

Show SolutionSELECT *, round((cum_total_sales/total_sales), 4) as cum_perc_sales FROM (select product, row_number() over (order by total_product_sales) as rn, total_product_sales, sum(total_product_sales) over (order by total_product_sales) as cum_total_sales, sum(total_product_sales) over (rows between unbounded preceding and unbounded following) as total_sales from (select *, sum(sales) as total_product_sales from demand group by product order by total_product_sales desc ) ) where perc_sales <= 0.8;

## Q13. What is the median value of sales overall?

Difficulty: Intermediate

**Task:**

Calculate the median sales value overall irrespective of product or location.

**Input:**

product | location | sales | refunds |
---|---|---|---|

A | S1 | 9 | 2 |

B | S1 | 8 | 1 |

B | S1 | 7 | 3 |

B | S2 | 51 | 8 |

C | S2 | 33 | 10 |

D | S2 | 12 | 3 |

A | S2 | 10 | 2 |

B | S2 | 152 | 17 |

E | S3 | 101 | 23 |

C | S3 | 73 | 12 |

F | S3 | 63 | 19 |

F | S3 | 23 | 4 |

D | S3 | 5 | 0 |

A | S3 | 5 | 0 |

**Desired Output:**

product | location | sales | refunds |
---|---|---|---|

D | S2 | 12 | 3 |

Solve Hands-On: HERE, Table Schema: Gist

*Note: SQLite does not support floor and ceil functions, so make sure to try out in MySQL or a db that supports these functions.*

ion;

```

select location, avg(sales) as median from (select *, row_number() over (order by sales) as rn, count(*) over () as total_rows from demand) as tbl where rn in (floor(total_rows/2), ceil(total_rows/2)) group by location;

## Q14. What is the median value of sales for each location?

Difficulty: Hard

**Task:**

Calculate the median sales value across all product within each location.

**Input:**

product | location | sales | refunds |
---|---|---|---|

A | S1 | 9 | 2 |

B | S1 | 8 | 1 |

B | S1 | 7 | 3 |

B | S2 | 51 | 8 |

C | S2 | 33 | 10 |

D | S2 | 12 | 3 |

A | S2 | 10 | 2 |

B | S2 | 152 | 17 |

E | S3 | 101 | 23 |

C | S3 | 73 | 12 |

F | S3 | 63 | 19 |

F | S3 | 23 | 4 |

D | S3 | 5 | 0 |

A | S3 | 5 | 0 |

**Desired Output:**

location | median_sales |
---|---|

S1 | 7.5000 |

S2 | 22.5000 |

S3 | 23.0000 |

*Note: SQLite does not support floor and ceil functions, so make sure to try out in MySQL or a db that supports these functions.*

Solve Hands-On: HERE, Table Schema: Gist

Show Solutionselect location, avg(sales) as median_sales from (select * from (select *, row_number() over (partition by location order by sales) as rn, count(*) over (partition by location) as total_rows from demand) as tbl where rn in (floor(total_rows/2), ceil(total_rows/2))) as sub group by location;

## Q15. Which product has the largest refund rate overall?

Difficulty: Intermediate

**Task:**

Find out which product had the highest refund rate and how much.

**Input:**

product | location | sales | refunds |
---|---|---|---|

A | S1 | 9 | 2 |

B | S1 | 8 | 1 |

B | S1 | 7 | 3 |

B | S2 | 51 | 8 |

C | S2 | 33 | 10 |

D | S2 | 12 | 3 |

A | S2 | 10 | 2 |

B | S2 | 152 | 17 |

E | S3 | 101 | 23 |

C | S3 | 73 | 12 |

F | S3 | 63 | 19 |

F | S3 | 23 | 4 |

D | S3 | 5 | 0 |

A | S3 | 5 | 0 |

**Desired Output:**

product | total_sales | total_refunds | refund_rate |
---|---|---|---|

F | 86 | 23 | 0.267442 |

Solve Hands-On: HERE, Table Schema: Gist

Show Solutionwith tbl as (select *, cast(total_refunds/total_sales as float) as refund_rate from ( select product, sum(sales) as total_sales, sum(refunds) as total_refunds from demand group by product ) as sub order by refund_rate desc) select * from tbl where refund_rate >= (select max(refund_rate) from tbl);