Menu

How to retrieve the last record in each group in SQL?

Problem

You have a table with multiple rows of data for each group, and you want to select the last row from each group.

From the orders table below, select the first row in each group.

Input

order_id customer_id order_date total_amount
1 1 2023-09-01 100.50
2 2 2023-09-02 75.20
3 1 2023-09-03 50.75
4 3 2023-09-04 120.00
5 2 2023-09-05 90.30
6 3 2023-09-06 60.45

Try Hands-On: Fiddle

Create Input Table: Gist

Desired Output

order_id customer_id order_date total_amount
3 1 2023-09-03 50.75
5 2 2023-09-05 90.30
6 3 2023-09-06 60.45

There are multiple ways to do this. Let’s look at some of them.

Solution 1:

Using GROUP BY and JOIN

To achieve this, we can use a combination of JOIN and GroupBy. The idea is to find the last record for each group which in this case is customer and then join that result with the main table.

SELECT o.*
FROM orders o
JOIN (
    SELECT customer_id, MAX(order_date) AS last_order_date
    FROM orders
    GROUP BY customer_id
) t ON o.customer_id = t.customer_id AND o.order_date = t.last_order_date;

Explanation:

This query first finds the earliest order_date for each customer_id in the subquery. Then, it joins this result with the main orders table to retrieve the details of the first order for each customer.

Solution 2:

Using ROW_NUMBER() window functions

MySQL does not have ROW_NUMBER() function. So, switch to a different DB like SQLite before running the below code.

The idea is to assign a row number to each row within a partition (in this case, for each customer_id) based on order_date, and then filter out the rows with row number 1.

    WITH NumberedOrders AS (
        SELECT *,
               ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_date) AS rn
        FROM orders
    )

    SELECT *
    FROM NumberedOrders
    WHERE rn = 1;

Explanation:

The Common Table Expression (CTE) NumberedOrders assigns row numbers to each row in the orders table, partitioned by customer_id and ordered by order_date.

The main query then filters the results to only select rows where the row number rn is 1, which would be the first order for each customer.

This approach takes advantage of the capabilities of MySQL 8.0+, where window functions like ROW_NUMBER() were introduced.

Solution 3:

Using Correlated Subquery.

SELECT o.*
FROM orders o
WHERE order_date = (
    SELECT MIN(order_date)
    FROM orders o_inner
    WHERE o_inner.customer_id = o.customer_id
);

Explanation:

For each row in the orders table (represented by the alias o), we perform a subquery.

This subquery (using the alias o_inner) retrieves the smallest (or “first” in terms of time) order_date for that specific customer_id.

If the order_date of the current row matches the result of this subquery, the row is included in the output.

Recommended Courses

  1. SQL for Data Science – Level 1
  2. SQL for Data Science – Level 2
  3. SQL for Data Science – Level 3

Recommended Tutorial

  1. Introduction to SQL
  2. SQL Window Functons – Made Simple and Easy
  3. SQL Subquery

More SQL Questions

  1. How to select only rows with max value on a column?
  2. How to transpose columns to rows in SQL?
  3. How to select first row in each GROUP BY group?
  4. How to get the top 1 row of each group in SQL?

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science