How to get top n results in each ‘group by’ group in SQL?

Problem

You have a table with multiple records. For each distinct value in one of the columns (i.e., each ‘group by’ group), you want to retrieve the top ‘n’ rows based on some criteria.

In the below example, get the top 2 entries for each sales person based on ‘amount’ column.

Input

id	salesperson	amount	sale_date
1	John	100.00	2023-09-10
2	John	200.00	2023-09-15
3	John	50.00	2023-09-18
4	Jane	150.00	2023-09-11
5	Jane	300.00	2023-09-14
6	Doe	400.00	2023-09-10
7	Doe	100.00	2023-09-16

Try Hands-On: Fiddle

Create Input Table: Gist

Desired Output

salesperson	amount	sale_date
Doe	400.00	2023-09-10
Doe	100.00	2023-09-16
Jane	300.00	2023-09-14
Jane	150.00	2023-09-11
John	200.00	2023-09-15
John	100.00	2023-09-10

Solution 1:

Using Window Functions

    WITH RankedSales AS (
        SELECT
            salesperson,
            amount,
            sale_date,
            ROW_NUMBER() OVER(PARTITION BY salesperson ORDER BY amount DESC) AS rn
        FROM sales
    )

    SELECT salesperson, amount, sale_date
    FROM RankedSales
    WHERE rn <= 2;

Explanation:

ROW_NUMBER() OVER(PARTITION BY salesperson ORDER BY amount DESC) – This assigns a unique sequential integer to rows within a partition of a result set.

We use a Common Table Expression (CTE) namedRankedSales to make our main query simpler.

In the main query, we filter out the results to only include rows where the row number (rn) is 2 or les
ns.

Solution 2:

Using Correlated sub query

SELECT s1.salesperson, s1.amount, s1.sale_date
FROM sales s1
WHERE (
    SELECT COUNT(*) 
    FROM sales s2 
    WHERE s1.salesperson = s2.salesperson AND s2.amount >= s1.amount
) <= 2
ORDER BY s1.salesperson, s1.amoun DESC;

Explanation:

The outer query iterates over each row in the sales table aliasing it as s1.

The inner query (correlated subquery) counts how many records have an amount greater than or equal to the current record for the same salesperson.

The condition of <= 2 in the outer query is used to limit the results to only the top 2 sales for each salesperson.

The main drawback of this method is that the correlated subquery will execute for each row in the sales table, which could be inefficient for larger datasets.

Still, this method provides a way to achieve the desired result without relying on window functions or session variables, making it useful in certain scenarios or older versions of databases that don’t support more advanced features.

How to get top n results in each ‘group by’ group in SQL?

Problem

Input

Desired Output

Solution 1:

Explanation:

Solution 2:

Explanation:

Recommended Courses

Recommended Tutorial

More SQL Questions

More Articles

How to sort multiple columns in SQL and in different directions?

How to count the number of work days between two dates?

How to compute maximum of multiple columns, aks row wise max?

How to use the GROUP BY clause on multiple columns in SQL?

How to get a list of dates between two dates in SQL?

What is the difference between CROSS JOIN and INNER JOIN in SQL?

Similar Articles

Complete Introduction to Linear Regression in R

How to implement common statistical significance tests and find the p value?

Logistic Regression – A Complete Tutorial With Examples in R

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos: