SQL window functions is considered a ‘hard’ concept in SQL. This set of exercises is designed to challenge your SQL muscle and help internalize data wrangling using window functions in SQL. The questions are of three levels of difficulty: Easy, Intermediate and Hard.
If you haven’t read the tutorial yet, read SQL Window Functions – Made Simple and Intuitive.
Try solving hands-on in MySQL using the link provided in each question.
Author’s note: The solutions provided is (mostly) one of the several possible solutions. The goal is not to conform with the provided solution. Try reach the desired output shown.
Q1. Find the running cumulative total demand.
Difficulty Level: Intermediate
Task:
From the demand2
table, find the cumulative total sum for qty
.
Input:
SELECT * FROM demand2;
day | qty |
---|---|
1 | 10 |
2 | 6 |
3 | 21 |
4 | 9 |
6 | 12 |
7 | 18 |
8 | 3 |
9 | 6 |
10 | 23 |
Desired Output:
day | qty | cumQty |
---|---|---|
1 | 10 | 10 |
2 | 6 | 16 |
3 | 21 | 37 |
4 | 9 | 46 |
6 | 12 | 58 |
7 | 18 | 76 |
8 | 3 | 79 |
9 | 6 | 85 |
10 | 23 | 108 |
Solve Hands-On: HERE, Table Schema and data: Gist
Show SolutionSELECT day, qty, SUM(qty) OVER(ORDER BY day) as cumQty FROM demand2;
Q2. Find the running cumulative total demand by product.
Difficulty Level: Intermediate
Task:
From the demand
table, find the cumulative total sum for qty
for each product
category.
Input:
product | day | qty |
---|---|---|
A | 1 | 10 |
A | 2 | 6 |
A | 3 | 21 |
A | 4 | 9 |
A | 5 | 19 |
B | 1 | 12 |
B | 2 | 18 |
B | 3 | 3 |
B | 4 | 6 |
B | 5 | 23 |
Desired Solution:
product | day | qty | CUMSUM |
---|---|---|---|
A | 1 | 10 | 10 |
A | 2 | 6 | 16 |
A | 3 | 21 | 37 |
A | 4 | 9 | 46 |
A | 5 | 19 | 65 |
B | 1 | 12 | 12 |
B | 2 | 18 | 30 |
B | 3 | 3 | 33 |
B | 4 | 6 | 39 |
B | 5 | 23 | 62 |
Solve Hands-On: HERE, Table Schema and data: Gist
Show SolutionSELECT product, day, qty, SUM(qty) OVER(PARTITION BY product ORDER BY day) as CUMSUM FROM demand;
Q3. When are the top 2 worst performing days for each product?
Difficulty Level: Intermediate
Task:
Extract the two worst performing days of each product in terms of number of qty
sold. Paraphrasing it: Get the days corresponding to the two minimum most values of qty
for each product.
Input:
product | day | qty |
---|---|---|
A | 1 | 10 |
A | 2 | 6 |
A | 3 | 21 |
A | 4 | 9 |
A | 5 | 19 |
B | 1 | 12 |
B | 2 | 18 |
B | 3 | 3 |
B | 4 | 6 |
B | 5 | 23 |
Desired Solution:
product | day | qty | RN |
---|---|---|---|
A | 2 | 6 | 1 |
A | 4 | 9 | 2 |
B | 3 | 3 | 1 |
B | 4 | 6 | 2 |
Solve Hands-On: HERE, Table Schema: Gist
Show Solutionwith tbl as (SELECT PRODUCT, qty, day, row_number() over (partition by product order by qty) as RN FROM demand) select product, day, qty from tbl a where RN in (1,2)
Q4. Find the percentage increase in qty
compared to the previous day.
Difficulty Level: Intermediate
Task:
Sort the table by qty
for each product and compute the percentage increase (or decrease) compared to the previous day.
Input:
product | day | qty |
---|---|---|
A | 1 | 10 |
A | 2 | 6 |
A | 3 | 21 |
A | 4 | 9 |
A | 5 | 19 |
B | 1 | 12 |
B | 2 | 18 |
B | 3 | 3 |
B | 4 | 6 |
B | 5 | 23 |
Desired Solution:
product | day | qty | qty_lag | perc_increase |
---|---|---|---|---|
A | 2 | 6 | 10 | -40 |
A | 3 | 21 | 6 | 250 |
A | 4 | 9 | 21 | -57.14 |
A | 5 | 19 | 9 | 111.11 |
B | 2 | 18 | 12 | 50 |
B | 3 | 3 | 18 | -83.33 |
B | 4 | 6 | 3 | 100 |
B | 5 | 23 | 6 | 283.33 |
Solve Hands-On: HERE, Table Schema: Gist
Show Solutionwith tbl as (select product, day, qty, lag(qty, 1) over (partition by product order by day) as qty_lag from demand) select *, round(((qty - qty_lag)/qty_lag) * 100, 2) as perc_increase from tbl where qty_lag is not null;
Q5. Show the minimum and maximum ‘qty’ sold for each product as separate columns.
Difficulty Level: Easy
Task:
Create two new columns in the table that shows the minimum and the maximum quantity sold for each product.
Input:
product | day | qty |
---|---|---|
A | 1 | 10 |
A | 2 | 6 |
A | 3 | 21 |
A | 4 | 9 |
A | 5 | 19 |
B | 1 | 12 |
B | 2 | 18 |
B | 3 | 3 |
B | 4 | 6 |
B | 5 | 23 |
Desired Solution:
product | day | qty | min_qty | max_qty |
---|---|---|---|---|
A | 1 | 10 | 6 | 21 |
A | 2 | 6 | 6 | 21 |
A | 3 | 21 | 6 | 21 |
A | 4 | 9 | 6 | 21 |
A | 5 | 19 | 6 | 21 |
B | 1 | 12 | 3 | 23 |
B | 2 | 18 | 3 | 23 |
B | 3 | 3 | 3 | 23 |
B | 4 | 6 | 3 | 23 |
B | 5 | 23 | 3 | 23 |
Solve Hands-On: HERE, Table Schema: Gist
Show Solutionselect *, min(qty) over(partition by product) as min_qty, max(qty) over(partition by product) as max_qty from demand;
Q6. Calculate the difference between the second largest and the second smallest sales qty
Difficulty Level: Hard
Task:
Calculate the diffence between the second largest and the second smallest sales qty
for each product.
Input:
product | day | qty |
---|---|---|
A | 1 | 10 |
A | 2 | 6 |
A | 3 | 21 |
A | 4 | 9 |
A | 5 | 19 |
B | 1 | 12 |
B | 2 | 18 |
B | 3 | 3 |
B | 4 | 6 |
B | 5 | 23 |
Desired Output:
product | day | qty |
---|---|---|
A | 4 | 9 |
A | 5 | 19 |
B | 4 | 6 |
B | 2 | 18 |
Solve Hands-On: HERE, Table Schema: Gist
Show Solutionselect product, day, qty from (select *, row_number() over (partition by product order by qty) as rownum, count(*) over (partition by product) as total_recs from demand) a where (rownum = 2) or rownum = (total_recs - 1);
Q7. On each day, which product had the highest sales?
Difficulty Level: Intermediate
Task:
Create a table to show the day
and the names of the product the the highest qty
sale.
Input:
product | day | qty |
---|---|---|
A | 1 | 10 |
A | 2 | 6 |
A | 3 | 21 |
A | 4 | 9 |
A | 5 | 19 |
B | 1 | 12 |
B | 2 | 18 |
B | 3 | 3 |
B | 4 | 6 |
B | 5 | 23 |
Desired Output:
day | product | qty | maxqty |
---|---|---|---|
1 | B | 12 | 12 |
2 | B | 18 | 18 |
3 | A | 21 | 21 |
4 | A | 9 | 9 |
5 | B | 23 | 23 |
Solve Hands-On: HERE, Table Schema: Gist
Show Solutionwith tbl as ( select day, product, qty, max(qty) over (partition by day) as maxqty from demand) select day, product from tbl where qty = maxqty;
Questions numbers Q8 to Q15 uses the same table as below.
Q8. Create row numbers in increasing order of sales.
Difficulty Level: Intermediate
Task:
Create row numbers in increasing order of sales
, starting with 1 in each location
Input:
product | location | sales | refunds |
---|---|---|---|
A | S1 | 9 | 2 |
B | S1 | 8 | 1 |
B | S1 | 7 | 3 |
B | S2 | 51 | 8 |
C | S2 | 33 | 10 |
D | S2 | 12 | 3 |
A | S2 | 10 | 2 |
B | S2 | 152 | 17 |
E | S3 | 101 | 23 |
C | S3 | 73 | 12 |
F | S3 | 63 | 19 |
F | S3 | 23 | 4 |
D | S3 | 5 | 0 |
A | S3 | 5 | 0 |
Desired Output:
product | location | sales | refunds | rn |
---|---|---|---|---|
B | S1 | 7 | 3 | 1 |
B | S1 | 8 | 1 | 2 |
A | S1 | 9 | 2 | 3 |
A | S2 | 10 | 2 | 1 |
D | S2 | 12 | 3 | 2 |
C | S2 | 33 | 10 | 3 |
B | S2 | 51 | 8 | 4 |
B | S2 | 152 | 17 | 5 |
D | S3 | 5 | 0 | 1 |
A | S3 | 5 | 0 | 2 |
F | S3 | 23 | 4 | 3 |
F | S3 | 63 | 19 | 4 |
C | S3 | 73 | 12 | 5 |
E | S3 | 101 | 23 | 6 |
Solve Hands-On: HERE, Table Schema: Gist
Show Solutionselect *, row_number() over(partition by location order by sales) as rn from demand;
Q9. Find the top products (Rank 1 and 2) within each location
Difficulty: Intermediate
Task:
Extract the names of top selling products (rank 1 and 2) in each location. That is, the products with highest and second highest sales in each region.
Input:
product | location | sales | refunds |
---|---|---|---|
A | S1 | 9 | 2 |
B | S1 | 8 | 1 |
B | S1 | 7 | 3 |
B | S2 | 51 | 8 |
C | S2 | 33 | 10 |
D | S2 | 12 | 3 |
A | S2 | 10 | 2 |
B | S2 | 152 | 17 |
E | S3 | 101 | 23 |
C | S3 | 73 | 12 |
F | S3 | 63 | 19 |
F | S3 | 23 | 4 |
D | S3 | 5 | 0 |
A | S3 | 5 | 0 |
Desired Output:
location | product | total_sales | rn |
---|---|---|---|
S1 | B | 15 | 1 |
S1 | A | 9 | 2 |
S2 | B | 203 | 1 |
S2 | C | 33 | 2 |
S3 | E | 101 | 1 |
S3 | F | 86 | 2 |
Solve Hands-On: HERE, Table Schema: Gist
Show Solutiondrop table if exists temp; create table temp as with tbl as (select location, product, sum(sales) as total_sales from demand group by location, product order by location, total_sales desc) select *, row_number() over (partition by location order by total_sales desc) as rn from tbl; select * from temp where rn <= 2;
Q10. What is the total sales from Top 3 products in each location?
Difficulty: Intermediate
Task:
Calculate the total sales from the top 3 best selling products in each location.
Input:
product | location | sales | refunds |
---|---|---|---|
A | S1 | 9 | 2 |
B | S1 | 8 | 1 |
B | S1 | 7 | 3 |
B | S2 | 51 | 8 |
C | S2 | 33 | 10 |
D | S2 | 12 | 3 |
A | S2 | 10 | 2 |
B | S2 | 152 | 17 |
E | S3 | 101 | 23 |
C | S3 | 73 | 12 |
F | S3 | 63 | 19 |
F | S3 | 23 | 4 |
D | S3 | 5 | 0 |
A | S3 | 5 | 0 |
Desired Output:
location | product | total_sales | rn |
---|---|---|---|
S1 | B | 15 | 1 |
S1 | A | 9 | 2 |
S2 | B | 203 | 1 |
S2 | C | 33 | 2 |
S2 | D | 12 | 3 |
S3 | E | 101 | 1 |
S3 | F | 86 | 2 |
S3 | C | 73 | 3 |
Solve Hands-On: HERE, Table Schema and data: Gist
Show Solutionselect location, product, total_sales, rn from (select *, row_number() over(partition by location order by total_sales desc) as rn from (select location, product, sum(sales) as total_sales from demand group by location, product order by location, total_sales desc) ) where rn <= 3;
Q11. Calculate the proportion of sales from each location
Difficulty: Intermediate
Task:
Calculate the proportion of sales from each location and show this against each record.
Input:
product | location | sales | refunds |
---|---|---|---|
A | S1 | 9 | 2 |
B | S1 | 8 | 1 |
B | S1 | 7 | 3 |
B | S2 | 51 | 8 |
C | S2 | 33 | 10 |
D | S2 | 12 | 3 |
A | S2 | 10 | 2 |
B | S2 | 152 | 17 |
E | S3 | 101 | 23 |
C | S3 | 73 | 12 |
F | S3 | 63 | 19 |
F | S3 | 23 | 4 |
D | S3 | 5 | 0 |
A | S3 | 5 | 0 |
Desired Output:
product | location | sales | refunds | total_location | total_sales | ratio |
---|---|---|---|---|---|---|
A | S1 | 9 | 2 | 24 | 552 | 0.0435 |
B | S1 | 8 | 1 | 24 | 552 | 0.0435 |
B | S1 | 7 | 3 | 24 | 552 | 0.0435 |
B | S2 | 51 | 8 | 258 | 552 | 0.4674 |
C | S2 | 33 | 10 | 258 | 552 | 0.4674 |
D | S2 | 12 | 3 | 258 | 552 | 0.4674 |
A | S2 | 10 | 2 | 258 | 552 | 0.4674 |
B | S2 | 152 | 17 | 258 | 552 | 0.4674 |
E | S3 | 101 | 23 | 270 | 552 | 0.4891 |
C | S3 | 73 | 12 | 270 | 552 | 0.4891 |
F | S3 | 63 | 19 | 270 | 552 | 0.4891 |
F | S3 | 23 | 4 | 270 | 552 | 0.4891 |
D | S3 | 5 | 0 | 270 | 552 | 0.4891 |
A | S3 | 5 | 0 | 270 | 552 | 0.4891 |
SELECT *, total_location, round(cast(total_location as float) / cast(total_sales as float), 4) as ratio from ( SELECT *, sum(sales) over(partition by location rows between unbounded preceding and unbounded following) as total_location, sum(sales) over(rows between unbounded preceding and unbounded following) as total_sales FROM demand );
Q12. Which products contribute to top 80% of total sales?
Difficulty: Hard
Task:
Calculate and extract the best selling products whose total sales does not exceed 80% of overall sales. Show the cumulative percentage contribution to sales as a percentage as well.
Input:
product | location | sales | refunds |
---|---|---|---|
A | S1 | 9 | 2 |
B | S1 | 8 | 1 |
B | S1 | 7 | 3 |
B | S2 | 51 | 8 |
C | S2 | 33 | 10 |
D | S2 | 12 | 3 |
A | S2 | 10 | 2 |
B | S2 | 152 | 17 |
E | S3 | 101 | 23 |
C | S3 | 73 | 12 |
F | S3 | 63 | 19 |
F | S3 | 23 | 4 |
D | S3 | 5 | 0 |
A | S3 | 5 | 0 |
Desired Output:
product | rn | total_product_sales | cum_total_sales | total_sales | cum_perc_sales |
---|---|---|---|---|---|
D | 1 | 17 | 17 | 552 | 0.03 |
A | 2 | 24 | 41 | 552 | 0.074 |
F | 3 | 86 | 127 | 552 | 0.23 |
E | 4 | 101 | 228 | 552 | 0.41 |
C | 5 | 106 | 334 | 552 | 0.60 |
Solve Hands-On: HERE, Table Schema: Gist
Show SolutionSELECT *, round((cum_total_sales/total_sales), 4) as cum_perc_sales FROM (select product, row_number() over (order by total_product_sales) as rn, total_product_sales, sum(total_product_sales) over (order by total_product_sales) as cum_total_sales, sum(total_product_sales) over (rows between unbounded preceding and unbounded following) as total_sales from (select *, sum(sales) as total_product_sales from demand group by product order by total_product_sales desc ) ) where perc_sales <= 0.8;
Q13. What is the median value of sales overall?
Difficulty: Intermediate
Task:
Calculate the median sales value overall irrespective of product or location.
Input:
product | location | sales | refunds |
---|---|---|---|
A | S1 | 9 | 2 |
B | S1 | 8 | 1 |
B | S1 | 7 | 3 |
B | S2 | 51 | 8 |
C | S2 | 33 | 10 |
D | S2 | 12 | 3 |
A | S2 | 10 | 2 |
B | S2 | 152 | 17 |
E | S3 | 101 | 23 |
C | S3 | 73 | 12 |
F | S3 | 63 | 19 |
F | S3 | 23 | 4 |
D | S3 | 5 | 0 |
A | S3 | 5 | 0 |
Desired Output:
product | location | sales | refunds |
---|---|---|---|
D | S2 | 12 | 3 |
Solve Hands-On: HERE, Table Schema: Gist
Note: SQLite does not support floor and ceil functions, so make sure to try out in MySQL or a db that supports these functions.
ion;
```
select location, avg(sales) as median from (select *, row_number() over (order by sales) as rn, count(*) over () as total_rows from demand) as tbl where rn in (floor(total_rows/2), ceil(total_rows/2)) group by location;
Q14. What is the median value of sales for each location?
Difficulty: Hard
Task:
Calculate the median sales value across all product within each location.
Input:
product | location | sales | refunds |
---|---|---|---|
A | S1 | 9 | 2 |
B | S1 | 8 | 1 |
B | S1 | 7 | 3 |
B | S2 | 51 | 8 |
C | S2 | 33 | 10 |
D | S2 | 12 | 3 |
A | S2 | 10 | 2 |
B | S2 | 152 | 17 |
E | S3 | 101 | 23 |
C | S3 | 73 | 12 |
F | S3 | 63 | 19 |
F | S3 | 23 | 4 |
D | S3 | 5 | 0 |
A | S3 | 5 | 0 |
Desired Output:
location | median_sales |
---|---|
S1 | 7.5000 |
S2 | 22.5000 |
S3 | 23.0000 |
Note: SQLite does not support floor and ceil functions, so make sure to try out in MySQL or a db that supports these functions.
Solve Hands-On: HERE, Table Schema: Gist
Show Solutionselect location, avg(sales) as median_sales from (select * from (select *, row_number() over (partition by location order by sales) as rn, count(*) over (partition by location) as total_rows from demand) as tbl where rn in (floor(total_rows/2), ceil(total_rows/2))) as sub group by location;
Q15. Which product has the largest refund rate overall?
Difficulty: Intermediate
Task:
Find out which product had the highest refund rate and how much.
Input:
product | location | sales | refunds |
---|---|---|---|
A | S1 | 9 | 2 |
B | S1 | 8 | 1 |
B | S1 | 7 | 3 |
B | S2 | 51 | 8 |
C | S2 | 33 | 10 |
D | S2 | 12 | 3 |
A | S2 | 10 | 2 |
B | S2 | 152 | 17 |
E | S3 | 101 | 23 |
C | S3 | 73 | 12 |
F | S3 | 63 | 19 |
F | S3 | 23 | 4 |
D | S3 | 5 | 0 |
A | S3 | 5 | 0 |
Desired Output:
product | total_sales | total_refunds | refund_rate |
---|---|---|---|
F | 86 | 23 | 0.267442 |
Solve Hands-On: HERE, Table Schema: Gist
Show Solutionwith tbl as (select *, cast(total_refunds/total_sales as float) as refund_rate from ( select product, sum(sales) as total_sales, sum(refunds) as total_refunds from demand group by product ) as sub order by refund_rate desc) select * from tbl where refund_rate >= (select max(refund_rate) from tbl);