Menu

SQL Window Functions Exercises – Practice to perfection

SQL window functions is considered a ‘hard’ concept in SQL. This set of exercises is designed to challenge your SQL muscle and help internalize data wrangling using window functions in SQL. The questions are of three levels of difficulty: Easy, Intermediate and Hard.

If you haven’t read the tutorial yet, read SQL Window Functions – Made Simple and Intuitive.

Try solving hands-on in MySQL using the link provided in each question.

Author’s note: The solutions provided is (mostly) one of the several possible solutions. The goal is not to conform with the provided solution. Try reach the desired output shown.

Q1. Find the running cumulative total demand.

Difficulty Level: Intermediate

Task:

From the demand2 table, find the cumulative total sum for qty.

Input:

SELECT * FROM demand2;
day qty
1 10
2 6
3 21
4 9
6 12
7 18
8 3
9 6
10 23

Desired Output:

day qty cumQty
1 10 10
2 6 16
3 21 37
4 9 46
6 12 58
7 18 76
8 3 79
9 6 85
10 23 108

Solve Hands-On: HERE, Table Schema and data: Gist

Show Solution
SELECT day, 
       qty,
       SUM(qty) OVER(ORDER BY day) as cumQty 
FROM demand2;

Q2. Find the running cumulative total demand by product.

Difficulty Level: Intermediate

Task:

From the demand table, find the cumulative total sum for qty for each product category.

Input:

product day qty
A 1 10
A 2 6
A 3 21
A 4 9
A 5 19
B 1 12
B 2 18
B 3 3
B 4 6
B 5 23

Desired Solution:

product day qty CUMSUM
A 1 10 10
A 2 6 16
A 3 21 37
A 4 9 46
A 5 19 65
B 1 12 12
B 2 18 30
B 3 3 33
B 4 6 39
B 5 23 62

Solve Hands-On: HERE, Table Schema and data: Gist

Show Solution
SELECT product,
	day,
        qty,
        SUM(qty) OVER(PARTITION BY product ORDER BY day) as CUMSUM
FROM demand;

Q3. When are the top 2 worst performing days for each product?

Difficulty Level: Intermediate

Task:

Extract the two worst performing days of each product in terms of number of qty sold. Paraphrasing it: Get the days corresponding to the two minimum most values of qty for each product.

Input:

product day qty
A 1 10
A 2 6
A 3 21
A 4 9
A 5 19
B 1 12
B 2 18
B 3 3
B 4 6
B 5 23

Desired Solution:

product day qty RN
A 2 6 1
A 4 9 2
B 3 3 1
B 4 6 2

Solve Hands-On: HERE, Table Schema: Gist

Show Solution
with tbl as (SELECT PRODUCT,
                    qty,
                    day,
                    row_number() over (partition by product order by qty) as RN
FROM demand)
select product, day, qty
from tbl a
where RN in (1,2)

Q4. Find the percentage increase in qty compared to the previous day.

Difficulty Level: Intermediate

Task:

Sort the table by qty for each product and compute the percentage increase (or decrease) compared to the previous day.

Input:

product day qty
A 1 10
A 2 6
A 3 21
A 4 9
A 5 19
B 1 12
B 2 18
B 3 3
B 4 6
B 5 23

Desired Solution:

product day qty qty_lag perc_increase
A 2 6 10 -40
A 3 21 6 250
A 4 9 21 -57.14
A 5 19 9 111.11
B 2 18 12 50
B 3 3 18 -83.33
B 4 6 3 100
B 5 23 6 283.33

Solve Hands-On: HERE, Table Schema: Gist

Show Solution
with tbl as (select product, 
                    day, 
                    qty, 
                    lag(qty, 1) over (partition by product order by day) as qty_lag
from demand)
select *,
round(((qty - qty_lag)/qty_lag) * 100, 2) as perc_increase
from tbl
where qty_lag is not null;

Q5. Show the minimum and maximum ‘qty’ sold for each product as separate columns.

Difficulty Level: Easy

Task:

Create two new columns in the table that shows the minimum and the maximum quantity sold for each product.

Input:

product day qty
A 1 10
A 2 6
A 3 21
A 4 9
A 5 19
B 1 12
B 2 18
B 3 3
B 4 6
B 5 23

Desired Solution:

product day qty min_qty max_qty
A 1 10 6 21
A 2 6 6 21
A 3 21 6 21
A 4 9 6 21
A 5 19 6 21
B 1 12 3 23
B 2 18 3 23
B 3 3 3 23
B 4 6 3 23
B 5 23 3 23

Solve Hands-On: HERE, Table Schema: Gist

Show Solution
select *,
        min(qty) over(partition by product) as min_qty, 
        max(qty) over(partition by product) as max_qty 
from
demand;

Q6. Calculate the difference between the second largest and the second smallest sales qty

Difficulty Level: Hard

Task:

Calculate the diffence between the second largest and the second smallest sales qty for each product.

Input:

product day qty
A 1 10
A 2 6
A 3 21
A 4 9
A 5 19
B 1 12
B 2 18
B 3 3
B 4 6
B 5 23

Desired Output:

product day qty
A 4 9
A 5 19
B 4 6
B 2 18

Solve Hands-On: HERE, Table Schema: Gist

Show Solution
select product, 
 day, 
 qty
from
  (select *,
  row_number() over (partition by product order by qty) as rownum, 
  count(*) over (partition by product) as total_recs
  from
  demand) a
where (rownum = 2) or rownum = (total_recs - 1);

Q7. On each day, which product had the highest sales?

Difficulty Level: Intermediate

Task:

Create a table to show the day and the names of the product the the highest qty sale.

Input:

product day qty
A 1 10
A 2 6
A 3 21
A 4 9
A 5 19
B 1 12
B 2 18
B 3 3
B 4 6
B 5 23

Desired Output:

day product qty maxqty
1 B 12 12
2 B 18 18
3 A 21 21
4 A 9 9
5 B 23 23

Solve Hands-On: HERE, Table Schema: Gist

Show Solution
with tbl as (
  select day, 
          product,
          qty,
          max(qty) over (partition by day) as maxqty
  from demand)
select day,
	product 
from tbl
where qty = maxqty;

Questions numbers Q8 to Q15 uses the same table as below.

Q8. Create row numbers in increasing order of sales.

Difficulty Level: Intermediate

Task:

Create row numbers in increasing order of sales, starting with 1 in each location

Input:

product location sales refunds
A S1 9 2
B S1 8 1
B S1 7 3
B S2 51 8
C S2 33 10
D S2 12 3
A S2 10 2
B S2 152 17
E S3 101 23
C S3 73 12
F S3 63 19
F S3 23 4
D S3 5 0
A S3 5 0

Desired Output:

product location sales refunds rn
B S1 7 3 1
B S1 8 1 2
A S1 9 2 3
A S2 10 2 1
D S2 12 3 2
C S2 33 10 3
B S2 51 8 4
B S2 152 17 5
D S3 5 0 1
A S3 5 0 2
F S3 23 4 3
F S3 63 19 4
C S3 73 12 5
E S3 101 23 6

Solve Hands-On: HERE, Table Schema: Gist

Show Solution
select *,
row_number() over(partition by location order by sales) as rn
from demand;

Q9. Find the top products (Rank 1 and 2) within each location

Difficulty: Intermediate

Task:

Extract the names of top selling products (rank 1 and 2) in each location. That is, the products with highest and second highest sales in each region.

Input:

product location sales refunds
A S1 9 2
B S1 8 1
B S1 7 3
B S2 51 8
C S2 33 10
D S2 12 3
A S2 10 2
B S2 152 17
E S3 101 23
C S3 73 12
F S3 63 19
F S3 23 4
D S3 5 0
A S3 5 0

Desired Output:

location product total_sales rn
S1 B 15 1
S1 A 9 2
S2 B 203 1
S2 C 33 2
S3 E 101 1
S3 F 86 2

Solve Hands-On: HERE, Table Schema: Gist

Show Solution
drop table if exists temp;

create table temp as
with tbl as (select location, 
                    product,
                    sum(sales) as total_sales
from demand
group by location, product
order by location, total_sales desc)
select *, 
row_number() over (partition by location order by total_sales desc) as rn
from tbl;

select * from temp
where rn <= 2;

Q10. What is the total sales from Top 3 products in each location?

Difficulty: Intermediate

Task:

Calculate the total sales from the top 3 best selling products in each location.

Input:

product location sales refunds
A S1 9 2
B S1 8 1
B S1 7 3
B S2 51 8
C S2 33 10
D S2 12 3
A S2 10 2
B S2 152 17
E S3 101 23
C S3 73 12
F S3 63 19
F S3 23 4
D S3 5 0
A S3 5 0

Desired Output:

location product total_sales rn
S1 B 15 1
S1 A 9 2
S2 B 203 1
S2 C 33 2
S2 D 12 3
S3 E 101 1
S3 F 86 2
S3 C 73 3

Solve Hands-On: HERE, Table Schema and data: Gist

Show Solution
select location,
		product,
        total_sales,
        rn
from 
(select *, 
	row_number() over(partition by location order by total_sales desc) as rn
from (select location, 
              product, 
              sum(sales) as total_sales
      from demand
      group by location, product
      order by location, total_sales desc)
 ) 
 where rn <= 3;

Q11. Calculate the proportion of sales from each location

Difficulty: Intermediate

Task:

Calculate the proportion of sales from each location and show this against each record.

Input:

product location sales refunds
A S1 9 2
B S1 8 1
B S1 7 3
B S2 51 8
C S2 33 10
D S2 12 3
A S2 10 2
B S2 152 17
E S3 101 23
C S3 73 12
F S3 63 19
F S3 23 4
D S3 5 0
A S3 5 0

Desired Output:

product location sales refunds total_location total_sales ratio
A S1 9 2 24 552 0.0435
B S1 8 1 24 552 0.0435
B S1 7 3 24 552 0.0435
B S2 51 8 258 552 0.4674
C S2 33 10 258 552 0.4674
D S2 12 3 258 552 0.4674
A S2 10 2 258 552 0.4674
B S2 152 17 258 552 0.4674
E S3 101 23 270 552 0.4891
C S3 73 12 270 552 0.4891
F S3 63 19 270 552 0.4891
F S3 23 4 270 552 0.4891
D S3 5 0 270 552 0.4891
A S3 5 0 270 552 0.4891
Show Solution
SELECT *,
total_location,
round(cast(total_location as float) / cast(total_sales as float), 4) as ratio
from
(
  SELECT *,
  sum(sales) over(partition by location rows between unbounded preceding and unbounded following) as total_location, 
  sum(sales) over(rows between unbounded preceding and unbounded following) as total_sales 
  FROM demand
);

Q12. Which products contribute to top 80% of total sales?

Difficulty: Hard

Task:

Calculate and extract the best selling products whose total sales does not exceed 80% of overall sales. Show the cumulative percentage contribution to sales as a percentage as well.

Input:

product location sales refunds
A S1 9 2
B S1 8 1
B S1 7 3
B S2 51 8
C S2 33 10
D S2 12 3
A S2 10 2
B S2 152 17
E S3 101 23
C S3 73 12
F S3 63 19
F S3 23 4
D S3 5 0
A S3 5 0

Desired Output:

product rn total_product_sales cum_total_sales total_sales cum_perc_sales
D 1 17 17 552 0.03
A 2 24 41 552 0.074
F 3 86 127 552 0.23
E 4 101 228 552 0.41
C 5 106 334 552 0.60

Solve Hands-On: HERE, Table Schema: Gist

Show Solution
SELECT *,
round((cum_total_sales/total_sales), 4) as cum_perc_sales
FROM
(select product,
	row_number() over (order by total_product_sales) as rn,
        total_product_sales,
        sum(total_product_sales) over (order by total_product_sales) as cum_total_sales,
 	sum(total_product_sales) over (rows between unbounded preceding and unbounded following) as total_sales
from
  (select *, 
  sum(sales) as total_product_sales
  from demand
  group by product
  order by total_product_sales desc
  )
)
where perc_sales <= 0.8;

Q13. What is the median value of sales overall?

Difficulty: Intermediate

Task:

Calculate the median sales value overall irrespective of product or location.

Input:

product location sales refunds
A S1 9 2
B S1 8 1
B S1 7 3
B S2 51 8
C S2 33 10
D S2 12 3
A S2 10 2
B S2 152 17
E S3 101 23
C S3 73 12
F S3 63 19
F S3 23 4
D S3 5 0
A S3 5 0

Desired Output:

product location sales refunds
D S2 12 3

Solve Hands-On: HERE, Table Schema: Gist

Note: SQLite does not support floor and ceil functions, so make sure to try out in MySQL or a db that supports these functions.
ion;
```

Show Solution
select location, 
       avg(sales) as median
from
  (select *,
  row_number() over (order by sales) as rn,
  count(*) over () as total_rows
  from demand) as tbl
where rn in (floor(total_rows/2), ceil(total_rows/2))
group by location;

Q14. What is the median value of sales for each location?

Difficulty: Hard

Task:

Calculate the median sales value across all product within each location.

Input:

product location sales refunds
A S1 9 2
B S1 8 1
B S1 7 3
B S2 51 8
C S2 33 10
D S2 12 3
A S2 10 2
B S2 152 17
E S3 101 23
C S3 73 12
F S3 63 19
F S3 23 4
D S3 5 0
A S3 5 0

Desired Output:

location median_sales
S1 7.5000
S2 22.5000
S3 23.0000

Note: SQLite does not support floor and ceil functions, so make sure to try out in MySQL or a db that supports these functions.

Solve Hands-On: HERE, Table Schema: Gist

Show Solution
select location, 
       avg(sales) as median_sales
from
 (select *
  from
        (select *,
        row_number() over (partition by location order by sales) as rn,
        count(*) over (partition by location) as total_rows
        from demand) as tbl
where rn in (floor(total_rows/2), ceil(total_rows/2))) as sub
group by location;

Q15. Which product has the largest refund rate overall?

Difficulty: Intermediate

Task:

Find out which product had the highest refund rate and how much.

Input:

product location sales refunds
A S1 9 2
B S1 8 1
B S1 7 3
B S2 51 8
C S2 33 10
D S2 12 3
A S2 10 2
B S2 152 17
E S3 101 23
C S3 73 12
F S3 63 19
F S3 23 4
D S3 5 0
A S3 5 0

Desired Output:

product total_sales total_refunds refund_rate
F 86 23 0.267442

Solve Hands-On: HERE, Table Schema: Gist

Show Solution
with tbl as (select *,
      cast(total_refunds/total_sales as float) as refund_rate
  from
  (
    select product,
            sum(sales) as total_sales,
            sum(refunds) as total_refunds
    from demand
    group by product
  ) as sub
  order by refund_rate desc)
  select * from tbl
  where refund_rate >= (select max(refund_rate) from tbl);

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science