SQL window functions is one of the advanced concepts, understanding which will give you the ability to do complex data wrangling and transformations in SQL.
In this guide, we will intuitively understand how window functions work in a way you will never forget. Don’t memorize anything, just read through and you will clearly understand how the window functions works. Then you will know when and how to use it.
Once you complete this guide, head over to SQL Window Function Exercises and solve all the questions there.
Let’s get started.
In this on we will cover:
- Understanding Window Functions
- Type 1: Aggregation
2.1. Aggregation functions used with Window functions - Type 2: Ranking
3.1. Ranking Functions - Difference between
rank()
anddense_rank()
? - Type 3: Comparing boundaries with Offsets
5.1. Offset Functions you will need - Type 4: Rolling Aggregates with ROWS BETWEEN clause
6.1. Understanding ROWS BETWEEN Clause
1. Understanding Window Functions
Window function performs a calculation across a ‘window of rows’ in relation to the current row. You get to define how these Windows, also called partitions, are created. You will see how to do that shortly.
This is comparable to ‘group by’, but instead of returning one row for a group, it will return result for each row based on the ‘window’ partitions that you define.
Initially, it will appears like the applications of window functions is vast and difficult to learn. But essentially, there are only four types of window functions operations.
By understanding these 4 types, you will understand nearly all applications of Window functions:
- Aggregation: You can create sums, averages, counts, max, min, etc. across related sets of data. For example, calculating a running total of sales.
-
Ranking: You can rank items in your data set. For example, listing top salespeople, top 10 students, etc.
-
Value comparison: You can compare values in a data set. For example, compare the current value with the previous value to find trends or patterns.
-
Rolling Aggregates: Comparing specific ranks of values and aggregating if needed. You will learn the
ROWS BETWEEN
clause here.
Let’s use the students Table
created below to illustrate the Window Function concept. You can create this table using the following schema.
Consider practicing the examples in this tutorial hands-on here.
select * from students;
student_id | subject | score |
---|---|---|
1 | Math | 85 |
1 | English | 92 |
1 | Science | 87 |
2 | Math | 91 |
2 | English | 88 |
2 | Science | 93 |
3 | Math | 78 |
3 | English | 85 |
3 | Science | 90 |
4 | Math | 92 |
4 | English | 86 |
4 | Science | 89 |
5 | Math | 90 |
5 | English | 88 |
5 | Science | 85 |
6 | Math | 93 |
6 | English | 82 |
6 | Science | 90 |
2. Type 1: Aggregation
This is similar to grouping using Group By
, but instead of aggregating, we want to create a new column, without reducing the number of rows in the dataset.
Let’s understand with a solid example.
Task:
How to compute the average score for each student (and have the average shown as a separate column)?
Solution:
The usual way to do this is to do a ‘group by’ using student_id
to get the average score for each student. Then, ‘left join’ it with the base table. This is what you typically do if you are not comfortable with Window functions.
You can achieve the same effect using Window function in one step. In the process let’s also see the syntax.
Notice the AVG(SCORE) OVER W
part where W
is the window you can define later.
SELECT *,
Avg(score) over w as Average_Score
FROM STUDENTS
WINDOW w as (PARTITION BY student_id);
student_id | subject | score | Average_Score |
---|---|---|---|
1 | Math | 85 | 88.0 |
1 | English | 92 | 88.0 |
1 | Science | 87 | 88.0 |
2 | Math | 91 | 90.66666666666667 |
2 | English | 88 | 90.66666666666667 |
2 | Science | 93 | 90.66666666666667 |
3 | Math | 78 | 84.33333333333333 |
3 | English | 85 | 84.33333333333333 |
3 | Science | 90 | 84.33333333333333 |
4 | Math | 92 | 89.0 |
4 | English | 86 | 89.0 |
4 | Science | 89 | 89.0 |
5 | Math | 90 | 87.66666666666667 |
5 | English | 88 | 87.66666666666667 |
5 | Science | 85 | 87.66666666666667 |
6 | Math | 93 | 88.33333333333333 |
6 | English | 82 | 88.33333333333333 |
6 | Science | 90 | 88.33333333333333 |
The above command effectively creates partitions for each ‘student_id’. This is similar to groups, but without actually reducing the group to one row.
The AVG(SCORE)
calculation is done on the entire partition (student_id) but is repeatedly done for every row iteratively, instead of grouping by student_id
and returning one value per student_id
.
Hope this is clear? If not, please read that one more time.
Now, The ‘window’ statement in the last line after ‘FROM’ is not accepted in certain databases, such as MS SQL or Oracle SQL. So, better take it to the select statement like below.
SELECT *,
Avg(score) over (PARTITION BY student_id) as Average_Score
FROM STUDENTS
LIMIT 7;
student_id | subject | score | Average_Score |
---|---|---|---|
1 | Math | 85 | 88.0 |
1 | English | 92 | 88.0 |
1 | Science | 87 | 88.0 |
2 | Math | 91 | 90.66666666666667 |
2 | English | 88 | 90.66666666666667 |
2 | Science | 93 | 90.66666666666667 |
3 | Math | 78 | 84.33333333333333 |
2.1. Aggregation functions used with Window functions
Here are the other Aggregation functions used with Window functions. You probably know how they work, else refer to the SQL
- min(value): returns the minimum value across all window rows
- max(value): returns the maximum value
- count(value): returns the count of non-null values
- avg(value): returns the average value
- sum(value): returns the sum total value
- group_concat(value, separator): returns a string combining values using separator (SQLite and MySQL only)
3. Type 2: Ranking
This is the second type of usecase for window functions. This is easy to understand with an example as well.
Task:
How to rank the students in descending order of their Total Score?. Where, Total score is the sum of scores in all three subjects.
Solution:
First compute the total score and put in a separate table.
DROP TABLE IF EXISTS TEMP;
CREATE TABLE TEMP AS
SELECT student_id,
subject,
score,
AVG(score) OVER (PARTITION BY student_id) as average_score,
SUM(score) OVER (PARTITION BY student_id) as total_score
FROM students;
SELECT * FROM TEMP LIMIT 5;
student_id | subject | score | average_score | total_score |
---|---|---|---|---|
1 | Math | 85 | 88.0 | 264 |
1 | English | 92 | 88.0 | 264 |
1 | Science | 87 | 88.0 | 264 |
2 | Math | 91 | 90.66666666666667 | 272 |
2 | English | 88 | 90.66666666666667 | 272 |
Notice what is happening above in SUM(score) OVER (PARTITION BY student_id) as total_score
.
We sum(score)
for each student while iterating through the rows. For each row, the window includes all the rows that belong to the current row’s student_id
. This window is specified using the PARTITION BY student_id
command.
Now, in a similar way, compute the rank in descending order of Total Score.
SELECT *,
RANK() OVER (order by TOTAL_SCORE DESC) as srank
FROM TEMP;
student_id | subject | score | average_score | total_score | srank |
---|---|---|---|---|---|
2 | Math | 91 | 90.66666666666667 | 272 | 1 |
2 | English | 88 | 90.66666666666667 | 272 | 1 |
2 | Science | 93 | 90.66666666666667 | 272 | 1 |
4 | Math | 92 | 89.0 | 267 | 4 |
4 | English | 86 | 89.0 | 267 | 4 |
4 | Science | 89 | 89.0 | 267 | 4 |
6 | Math | 93 | 88.33333333333333 | 265 | 7 |
6 | English | 82 | 88.33333333333333 | 265 | 7 |
6 | Science | 90 | 88.33333333333333 | 265 | 7 |
1 | Math | 85 | 88.0 | 264 | 10 |
1 | English | 92 | 88.0 | 264 | 10 |
1 | Science | 87 | 88.0 | 264 | 10 |
5 | Math | 90 | 87.66666666666667 | 263 | 13 |
5 | English | 88 | 87.66666666666667 | 263 | 13 |
5 | Science | 85 | 87.66666666666667 | 263 | 13 |
3 | Math | 78 | 84.33333333333333 | 253 | 16 |
3 | English | 85 | 84.33333333333333 | 253 | 16 |
3 | Science | 90 | 84.33333333333333 | 253 | 16 |
See how the rows containing the same student_id
contains the same rank?
What if you want to increase the rank number for every row? So that the result becomes: 1,2,3,4 instead of 1,1,1,4?
To do this, we need to state another unique column besides the total score based on which RANK()
can assign the rank.
SELECT *,
RANK() OVER (ORDER BY TOTAL_SCORE DESC, SCORE DESC) as iRANK
FROM TEMP;
student_id | subject | score | average_score | total_score | iRANK |
---|---|---|---|---|---|
2 | Science | 93 | 90.66666666666667 | 272 | 1 |
2 | Math | 91 | 90.66666666666667 | 272 | 2 |
2 | English | 88 | 90.66666666666667 | 272 | 3 |
4 | Math | 92 | 89.0 | 267 | 4 |
4 | Science | 89 | 89.0 | 267 | 5 |
4 | English | 86 | 89.0 | 267 | 6 |
6 | Math | 93 | 88.33333333333333 | 265 | 7 |
6 | Science | 90 | 88.33333333333333 | 265 | 8 |
6 | English | 82 | 88.33333333333333 | 265 | 9 |
1 | English | 92 | 88.0 | 264 | 10 |
1 | Science | 87 | 88.0 | 264 | 11 |
1 | Math | 85 | 88.0 | 264 | 12 |
5 | Math | 90 | 87.66666666666667 | 263 | 13 |
5 | English | 88 | 87.66666666666667 | 263 | 14 |
5 | Science | 85 | 87.66666666666667 | 263 | 15 |
3 | Science | 90 | 84.33333333333333 | 253 | 16 |
3 | English | 85 | 84.33333333333333 | 253 | 17 |
3 | Math | 78 | 84.33333333333333 | 253 | 18 |
3.1. Ranking Functions
- row_number(): returns the row ordinal number
- dense_rank(): returns row rank
- rank(): returns row rank with possible gaps (see below)
- ntile(n): splits all rows into n groups and returns the index of the group that the row belongs to
4. Difference between rank()
and dense_rank()
?
When items have same value, rank() will increment to the row_number()
value when the next value occurs. Whereas, dense_rank()
will increment by only 1.
Example: Trying to rank the following – 10, 10, 10, 17, 17, 29
- rank() output: 1,1,1,4,4,6
- dense_rank() output: 1,1,1,2,2,3
5. Type 3: Comparing boundaries with Offsets
Let’s suppose you want to compare the salaries of adjacent members, that is, you want to find what percentage the current value is larger than the previous value.
Let’t first create the salaries table.
DROP TABLE IF EXISTS SALARIES;
CREATE TABLE salaries (
id INT,
name VARCHAR(20),
city VARCHAR(20),
dept VARCHAR(20),
salary FLOAT
);
INSERT INTO SALARIES (id, name, city, dept, salary) VALUES
(21, 'Dhanya', 'Chennai', 'hr', 75),
(22, 'Bob', 'London', 'hr', 71),
(31, 'Akira', 'Chennai', 'it', 89),
(32, 'Grace', 'Berlin', 'it', 60),
(33, 'Steven', 'London', 'it', 103),
(34, 'Ramesh', 'Chennai', 'it', 103),
(35, 'Frank', 'Berlin', 'it', 120),
(41, 'Cindy', 'Berlin', 'sales', 95),
(42, 'Yetunde', 'London', 'sales', 95),
(43, 'Alice', 'Berlin', 'sales', 100);
Practice Hands-On Here. The table schema is created, you will need to write the queries on the right hand side panel and run.
Task:
How to find the percentage diff increase in salaries of neighbours?
Solution:
We need to have the salaries sorted (by salary ascending) and then look up the previous salary value. Then, use the current and previous value to compute the difference.
Let’s first get the previous value using lag
function.
select id,
name,
city,
dept,
salary,
lag(salary) over (order by id asc) as prev_salary
from salaries;
id | name | city | dept | salary | prev_salary |
---|---|---|---|---|---|
21 | Dhanya | Chennai | hr | 75.0 | None |
22 | Bob | London | hr | 71.0 | 75.0 |
31 | Akira | Chennai | it | 89.0 | 71.0 |
32 | Grace | Berlin | it | 60.0 | 89.0 |
33 | Steven | London | it | 103.0 | 60.0 |
34 | Ramesh | Chennai | it | 103.0 | 103.0 |
35 | Frank | Berlin | it | 120.0 | 103.0 |
41 | Cindy | Berlin | sales | 95.0 | 120.0 |
42 | Yetunde | London | sales | 95.0 | 95.0 |
43 | Alice | Berlin | sales | 100.0 | 95.0 |
The previous salary is created, we can now create the percentage difference.
select *,
round((salary - prev_salary)/prev_salary, 4) * 100 as perc_diff
from (select id,
name,
city,
dept,
salary,
lag(salary) over (order by id asc) as prev_salary
from salaries);
id | name | city | dept | salary | prev_salary | perc_diff |
---|---|---|---|---|---|---|
21 | Dhanya | Chennai | hr | 75.0 | None | None |
22 | Bob | London | hr | 71.0 | 75.0 | -5.33 |
31 | Akira | Chennai | it | 89.0 | 71.0 | 25.35 |
32 | Grace | Berlin | it | 60.0 | 89.0 | -32.58 |
33 | Steven | London | it | 103.0 | 60.0 | 71.67 |
34 | Ramesh | Chennai | it | 103.0 | 103.0 | 0.0 |
35 | Frank | Berlin | it | 120.0 | 103.0 | 16.5 |
41 | Cindy | Berlin | sales | 95.0 | 120.0 | -20.830000000000002 |
42 | Yetunde | London | sales | 95.0 | 95.0 | 0.0 |
43 | Alice | Berlin | sales | 100.0 | 95.0 | 5.26 |
This can be written using a with
statement as well to improve the readability.
with sal2 as (
select id,
name,
city,
dept,
salary,
lag(salary) over (order by id asc) as prev_salary
from salaries)
select *,
round((salary - prev_salary)/prev_salary, 4) * 100 as perc_diff
from sal2;
id | name | city | dept | salary | prev_salary | perc_diff |
---|---|---|---|---|---|---|
21 | Dhanya | Chennai | hr | 75.0 | None | None |
22 | Bob | London | hr | 71.0 | 75.0 | -5.33 |
31 | Akira | Chennai | it | 89.0 | 71.0 | 25.35 |
32 | Grace | Berlin | it | 60.0 | 89.0 | -32.58 |
33 | Steven | London | it | 103.0 | 60.0 | 71.67 |
34 | Ramesh | Chennai | it | 103.0 | 103.0 | 0.0 |
35 | Frank | Berlin | it | 120.0 | 103.0 | 16.5 |
41 | Cindy | Berlin | sales | 95.0 | 120.0 | -20.830000000000002 |
42 | Yetunde | London | sales | 95.0 | 95.0 | 0.0 |
43 | Alice | Berlin | sales | 100.0 | 95.0 | 5.26 |
Alternately, we could’ve computed the lead and then computed the differences as well.
Let’s look at one more example.
Task 2:
How to compute the perc difference between the min and max values of salary in each department, while keeping the entries ordered by salary?
That is, you want to compute the percentage difference between the first (min) and last (max) values in each department.
Solution:
Let’s try using the min()
and max()
as window functions.
select id,
name,
city,
dept,
salary,
min(salary) over (partition by dept order by id asc) as min_salary,
max(salary) over (partition by dept order by id asc) as max_salary
from salaries
id | name | city | dept | salary | min_salary | max_salary |
---|---|---|---|---|---|---|
21 | Dhanya | Chennai | hr | 75.0 | 75.0 | 75.0 |
22 | Bob | London | hr | 71.0 | 71.0 | 75.0 |
31 | Akira | Chennai | it | 89.0 | 89.0 | 89.0 |
32 | Grace | Berlin | it | 60.0 | 60.0 | 89.0 |
33 | Steven | London | it | 103.0 | 60.0 | 103.0 |
34 | Ramesh | Chennai | it | 103.0 | 60.0 | 103.0 |
35 | Frank | Berlin | it | 120.0 | 60.0 | 120.0 |
41 | Cindy | Berlin | sales | 95.0 | 95.0 | 95.0 |
42 | Yetunde | London | sales | 95.0 | 95.0 | 95.0 |
43 | Alice | Berlin | sales | 100.0 | 95.0 | 100.0 |
There is a problem here.
The min values look fine but the max values aren’t.
Can we try using the first_value()
and last_value()
functions?
select id,
name,
city,
dept,
salary,
first_value(salary) over (partition by dept order by id asc) as min_salary,
last_value(salary) over (partition by dept order by id asc) as max_salary
from salaries
id | name | city | dept | salary | min_salary | max_salary |
---|---|---|---|---|---|---|
21 | Dhanya | Chennai | hr | 75.0 | 75.0 | 75.0 |
22 | Bob | London | hr | 71.0 | 75.0 | 71.0 |
31 | Akira | Chennai | it | 89.0 | 89.0 | 89.0 |
32 | Grace | Berlin | it | 60.0 | 89.0 | 60.0 |
33 | Steven | London | it | 103.0 | 89.0 | 103.0 |
34 | Ramesh | Chennai | it | 103.0 | 89.0 | 103.0 |
35 | Frank | Berlin | it | 120.0 | 89.0 | 120.0 |
41 | Cindy | Berlin | sales | 95.0 | 95.0 | 95.0 |
42 | Yetunde | London | sales | 95.0 | 95.0 | 95.0 |
43 | Alice | Berlin | sales | 100.0 | 95.0 | 100.0 |
This does not work as well. Can you figure what is happening here?
Because, eventhough we have defined the window parition by department, since we are using order by
, the processing is valid ONLY for the ‘Window Frame’, where the ‘frame’ goes from the first row to the current iteration row.
This is the default behaviour of the Frame. And this can be controlled, which we will do shortly.
But, let’s first understand the ‘Frame’ within the ‘Window’ clearly.
Ok. Have a look at the above ‘Salaries’ table.
When the iteration starts, the current row is the first row, whichever it is.
As the iteration proceeds for a given window (department), the frame size expands cumulatively from the beginning row up to the current row.
That is, when current row = 1, frame = 1st row only. But when the current row is 4, the frame includes the first 4 rows.
Look, how it is progressing below:
Window 1: Dept = HR
– Current row = 1, Frame rows = 1
– Current row = 2, Frame rows = 1,2
Window 2: Dept = IT
– Current row = 1, Frame rows = 1
– Current row = 2, Frame rows = 1,2
– Current row = 3, Frame rows = 1,2,3
– Current row = 4, Frame rows = 1,2,3,4
And so on..
We can abosolutely control the size of the frame in each iteration. I cover this in the next section.
But, let’s start by solving this usecase: Computing the min and max salary for each department, while keeping the order of neighbours.
Here it is.
select id,
name,
city,
dept,
salary,
first_value(salary) over (partition by dept order by salary asc) as min_salary,
last_value(salary) over (partition by dept
order by salary
rows between unbounded preceding and unbounded following) as max_salary
from salaries;
id | name | city | dept | salary | min_salary | max_salary |
---|---|---|---|---|---|---|
22 | Bob | London | hr | 71.0 | 71.0 | 75.0 |
21 | Dhanya | Chennai | hr | 75.0 | 71.0 | 75.0 |
32 | Grace | Berlin | it | 60.0 | 60.0 | 120.0 |
31 | Akira | Chennai | it | 89.0 | 60.0 | 120.0 |
33 | Steven | London | it | 103.0 | 60.0 | 120.0 |
34 | Ramesh | Chennai | it | 103.0 | 60.0 | 120.0 |
35 | Frank | Berlin | it | 120.0 | 60.0 | 120.0 |
41 | Cindy | Berlin | sales | 95.0 | 95.0 | 100.0 |
42 | Yetunde | London | sales | 95.0 | 95.0 | 100.0 |
43 | Alice | Berlin | sales | 100.0 | 95.0 | 100.0 |
The key part that makes the frame extend to the full width of the window is: rows between unbounded preceding and unbounded following
. We will break this down in detail in the following section.
5.1. Offset Functions you will need
- lag(value, offset): returns the value from the record that is offset rows behind the current one
- lead(value, offset): returns the value from the record that is offset rows ahead of the current one
- first_value(value): returns the value from the first row of the frame
- last_value(value): returns the value from the last row of the frame
- nth_value(value, n): returns the value from the n-th row of the frame
6. Type 4: Rolling Aggregates with ROWS BETWEEN clause
Let’s continue and take our understanding of window functions up a notch.
To get a complete understanding of window functions, you need to fully understand the rows between
command.
Again, let’s understand using an example.
Task:
Compute difference between the minimum and the second last value.
Solution:
Previously we saw how to compute the difference between the maximum and minimum. Let’s use those ideas to get the diff between second max and the minimum value.
First, to get the second last value we need to know the number of values in the group. Once you know this, you can get whichever value you want, be it second last, third from last and so on.
So, let’s get the number of items in each group first.
We are extending on top of the previous query, so look at the count(*)
statement in the below query.
select id,
name,
city,
dept,
salary,
first_value(salary) over (partition by dept order by salary asc) as min_salary,
last_value(salary) over (partition by dept
order by salary
rows between unbounded preceding and unbounded following) as max_salary,
count(*) over (partition by dept
rows between unbounded preceding and unbounded following) as count
from salaries;
id | name | city | dept | salary | min_salary | max_salary | count |
---|---|---|---|---|---|---|---|
22 | Bob | London | hr | 71.0 | 71.0 | 75.0 | 2 |
21 | Dhanya | Chennai | hr | 75.0 | 71.0 | 75.0 | 2 |
32 | Grace | Berlin | it | 60.0 | 60.0 | 120.0 | 5 |
31 | Akira | Chennai | it | 89.0 | 60.0 | 120.0 | 5 |
33 | Steven | London | it | 103.0 | 60.0 | 120.0 | 5 |
34 | Ramesh | Chennai | it | 103.0 | 60.0 | 120.0 | 5 |
35 | Frank | Berlin | it | 120.0 | 60.0 | 120.0 | 5 |
41 | Cindy | Berlin | sales | 95.0 | 95.0 | 100.0 | 3 |
42 | Yetunde | London | sales | 95.0 | 95.0 | 100.0 | 3 |
43 | Alice | Berlin | sales | 100.0 | 95.0 | 100.0 | 3 |
6.1. Understanding ROWS BETWEEN Clause
Note here the general structure of rows between
is as follows.
The syntax is:
ROWS BETWEEN <lower_bound> AND <upper_bound>
The purpose of the ROWS clause is to specify the starting and the ending row of the frame in relation to the ‘current row’.
Starting and ending rows might be fixed or relative to the current row based on the following keywords:
- CURRENT ROW: the current row
- UNBOUNDED PRECEDING: all rows before the current row -> fixed
- UNBOUNDED FOLLOWING: all rows after the current row -> fixed
- ‘n’ PRECEDING: ‘n’ rows before the current row -> relative
- ‘n’ FOLLOWING: ‘n’ rows after the current row -> relative
Here are few examples of how you can use the ROWS BETWEEN clause:
Example 1: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
: Frame would include the entire window for each iteration.
Example 2: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
: would include from 1st row in window to current row.
Example 3: ROWS BETWEEN UNBOUNDED PRECEDING AND 2 PRECEDING
: would include from 1st row in window to 2 rows BEFORE the current row.
Example 4: ROWS BETWEEN UNBOUNDED PRECEDING AND 3 FOLLOWING
: would include from 1st row in window to 3 rows AFTER the current row.
Questions:
What would it be for:
- Two rows before and two rows after the current row? (Easy)
- Two rows before to one before the last row? (Intermediate). Hint: find the count.
So, how to get the second last value?
–> Sort by salary first, then, use nth_value()
and count
column to get the n-1’th item.
select * from salaries;
id | name | city | dept | salary |
---|---|---|---|---|
21 | Dhanya | Chennai | hr | 75.0 |
22 | Bob | London | hr | 71.0 |
31 | Akira | Chennai | it | 89.0 |
32 | Grace | Berlin | it | 60.0 |
33 | Steven | London | it | 103.0 |
34 | Ramesh | Chennai | it | 103.0 |
35 | Frank | Berlin | it | 120.0 |
41 | Cindy | Berlin | sales | 95.0 |
42 | Yetunde | London | sales | 95.0 |
43 | Alice | Berlin | sales | 100.0 |
Count the number of rows in each partition. We will later use this along with nth_value
function next.
select id,
name,
city,
dept,
salary,
first_value(salary) over (partition by dept order by salary asc) as min_salary,
last_value(salary) over (partition by dept
order by salary
rows between unbounded preceding and unbounded following) as max_salary,
count(*) over (partition by dept
order by salary
rows between unbounded preceding and unbounded following) as count
from salaries;
id | name | city | dept | salary | min_salary | max_salary | count |
---|---|---|---|---|---|---|---|
22 | Bob | London | hr | 71.0 | 71.0 | 75.0 | 2 |
21 | Dhanya | Chennai | hr | 75.0 | 71.0 | 75.0 | 2 |
32 | Grace | Berlin | it | 60.0 | 60.0 | 120.0 | 5 |
31 | Akira | Chennai | it | 89.0 | 60.0 | 120.0 | 5 |
33 | Steven | London | it | 103.0 | 60.0 | 120.0 | 5 |
34 | Ramesh | Chennai | it | 103.0 | 60.0 | 120.0 | 5 |
35 | Frank | Berlin | it | 120.0 | 60.0 | 120.0 | 5 |
41 | Cindy | Berlin | sales | 95.0 | 95.0 | 100.0 | 3 |
42 | Yetunde | London | sales | 95.0 | 95.0 | 100.0 | 3 |
43 | Alice | Berlin | sales | 100.0 | 95.0 | 100.0 | 3 |
Extract the second last value
This can be done as a subquery using the nth_value(salary, count-1)
call.
DROP TABLE If EXISTS TMP2;
CREATE TABLE TMP2 AS
with tmp as (
select id,
name,
city,
dept,
salary,
first_value(salary) over (partition by dept order by salary asc) as min_salary,
last_value(salary) over (partition by dept
order by salary
rows between unbounded preceding and unbounded following) as max_salary,
count(*) over (partition by dept
order by salary
rows between unbounded preceding and unbounded following) as count
from salaries)
select id,name,city,dept,salary,
min_salary,
nth_value(salary, count-1) over (partition by dept
order by salary
rows between unbounded preceding and unbounded following) as last_minus_1_salary
from tmp;
Finally, compute the difference between last_minus_1 and minimum salary.
SELECT *, (last_minus_1_salary - min_salary) as DIFF
FROM TMP2;
id | name | city | dept | salary | min_salary | last_minus_1_salary | DIFF |
---|---|---|---|---|---|---|---|
22 | Bob | London | hr | 71.0 | 71.0 | 71.0 | 0.0 |
21 | Dhanya | Chennai | hr | 75.0 | 71.0 | 71.0 | 0.0 |
32 | Grace | Berlin | it | 60.0 | 60.0 | 103.0 | 43.0 |
31 | Akira | Chennai | it | 89.0 | 60.0 | 103.0 | 43.0 |
33 | Steven | London | it | 103.0 | 60.0 | 103.0 | 43.0 |
34 | Ramesh | Chennai | it | 103.0 | 60.0 | 103.0 | 43.0 |
35 | Frank | Berlin | it | 120.0 | 60.0 | 103.0 | 43.0 |
41 | Cindy | Berlin | sales | 95.0 | 95.0 | 95.0 | 0.0 |
42 | Yetunde | London | sales | 95.0 | 95.0 | 95.0 | 0.0 |
43 | Alice | Berlin | sales | 100.0 | 95.0 | 95.0 | 0.0 |
Congratulations if you have made it till here! You should have a fair understanding of Window functions now.
Next step is to try and practice SQL Window functions. I also teach complete SQL for Data Science and more, I look forward to see you inside.