Menu

How to select first row in each ‘Group By’ group?

Problem

You have a table with multiple rows of data for each group, and you want to select the first row from each group.

From the orders table below, select the first row in each group.

Input

order_id customer_id order_date product_name quantity
1 1 2023-01-01 Laptop 1
2 1 2023-01-02 Mouse 2
3 2 2023-01-03 Keyboard 1
4 2 2023-01-04 Monitor 1
5 3 2023-01-05 Mouse Pad 5
6 3 2023-01-06 USB Cable 3

Try Hands-On: Fiddle

Create Input Table: Gist

Desired Output

We want only the first record for each customer_id.

order_id customer_id order_date product_name quantity
1 1 2023-01-01 Laptop 1
3 2 2023-01-03 Keyboard 1
5 3 2023-01-05 Mouse Pad 5

There are multiple ways to do this. Let’s look at some of them.

Solution 1:

Using GROUP BY and JOIN

To achieve this, we can use a combination of JOIN and subquery. The idea is to find the earliest order date for each customer and then join that result with the main table.

SELECT o1.*
FROM orders o1
JOIN (
    SELECT customer_id, MIN(order_date) as first_order_date
    FROM orders
    GROUP BY customer_id
) o2 ON o1.customer_id = o2.customer_id AND o1.order_date = o2.first_order_date;

Explanation:

This query first finds the earliest order_date for each customer_id in the subquery. Then, it joins this result with the main orders table to retrieve the details of the first order for each customer.

Solution 2:

Using ROW_NUMBER() window functions

MySQL does not have ROW_NUMBER() function. So, switch to a different DB like SQLite before running the below code.

The idea is to assign a row number to each row within a partition (in this case, for each customer_id) based on order_date, and then filter out the rows with row number 1.

    WITH NumberedOrders AS (
        SELECT *,
               ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_date) AS rn
        FROM orders
    )

    SELECT *
    FROM NumberedOrders
    WHERE rn = 1;

Explanation:

The Common Table Expression (CTE) NumberedOrders assigns row numbers to each row in the orders table, partitioned by customer_id and ordered by order_date.

The main query then filters the results to only select rows where the row number rn is 1, which would be the first order for each customer.

This approach takes advantage of the capabilities of MySQL 8.0+, where window functions like ROW_NUMBER() were introduced.

Solution 3:

Using Correlated Subquery.

SELECT o.*
FROM orders o
WHERE order_date = (
    SELECT MIN(order_date)
    FROM orders o_inner
    WHERE o_inner.customer_id = o.customer_id
);

Explanation:

For each row in the orders table (represented by the alias o), we perform a subquery.

This subquery (using the alias o_inner) retrieves the smallest (or “first” in terms of time) order_date for that specific customer_id.

If the order_date of the current row matches the result of this subquery, the row is included in the output.

Recommended Courses

  1. SQL for Data Science – Level 1
  2. SQL for Data Science – Level 2
  3. SQL for Data Science – Level 3

Recommended Tutorial

  1. Introduction to SQL
  2. SQL Window Functons – Made Simple and Easy
  3. SQL Subquery

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science