Menu

How to randomly select rows from a large table in MySQL super fast?

Problem

How to randomly select rows quickly from a large table in MySQL?

Input

employee_id first_name last_name department salary
1 John Doe HR 50000.00
2 Jane Smith Engineering 60000.00
3 Alice Johnson Finance 55000.00
4 Bob Brown Sales 52000.00
5 Eva Williams Marketing 48000.00
6 Chris Davis Engineering 62000.00
7 Sarah Wilson Finance 56000.00
8 Mike Jones HR 51000.00
9 Linda Martinez Sales 53000.00
10 Tom Moore Marketing 49000.00
11 b0a93 c51ea Engineering 59930.00
12 1a7d6 57e07 Engineering 56147.00
13 4d366 7775b Engineering 54771.00
14 6fb08 b77e7 HR 59799.00
15 4a00a 88ec8 Finance 56144.00
16 77288 d1c5c Marketing 50731.00
17 3fdca e51ac Sales 58556.00
18 73313 37117 Engineering 52198.00
19 d90a8 f6b7c Finance 53655.00
20 8ec6a 833c8 Finance 53730.00

Try Hands-On: Fiddle

Create Input Table: Gist

Desired Output

employee_id first_name last_name department salary
9 Linda Martinez Sales 53000.00
19 9bc59 3de01 Finance 55591.00
11 ff441 bf511 Finance 55456.00
2 Jane Smith Engineering 60000.00
3 Alice Johnson Finance 55000.00

Solution 1:

Using generated Random number

SELECT * FROM employees
ORDER BY RAND()
LIMIT 5;

Explanation:

SELECT * FROM employees: This part of the query selects all columns from the employees table.

ORDER BY RAND(): This clause orders the rows randomly. RAND() generates a random number between 0 and 1 for each row, effectively shuffling the rows.

LIMIT 5: This limits the result set to the first 5 rows. You can change the number to the desired number of randomly selected rows.

Solution 2:

More efficient / faster approach.

Using generated Random number and WHERE

SELECT * FROM employees
WHERE RAND() <= 0.1
ORDER BY RAND()
LIMIT5;

Explanation:

  1. SELECT * FROM employees: This part of the query selects all columns from the employees table.

  2. WHERE RAND() <= 0.1: This condition filters rows with a random number less than or equal to 0.1. By adjusting the value 0.1, you can control the probability of selecting rows. For example, setting it to 0.1 means there’s a 10% chance of selecting each row.

  3. ORDER BY RAND(): This clause still orders the filtered rows randomly to ensure randomness within the selected subset.

  4. LIMIT 5: This limits the result set to the first 5 rows from the filtered and randomly ordered subset.

Solution 3:

SELECT * FROM employees
WHERE RAND() <= (
    SELECT (5 / COUNT(*)) FROM employees
)
ORDER BY RAND()
LIMIT 5;

Explanation:

WHERE RAND() <= (…): Here, we use a subquery to calculate a random selection probability. (5 / COUNT(*)) calculates the probability based on the number of rows in the table. In this case, it selects approximately 5 rows out of the total count.

You can adjust the 5 to control the desired number of rows to select.

Recommended Courses

  1. SQL for Data Science – Level 1
  2. SQL for Data Science – Level 2
  3. SQL for Data Science – Level 3

Recommended Tutorial

  1. Introduction to SQL
  2. SQL Window Functons – Made Simple and Easy
  3. SQL Subquery

More SQL Questions

  1. How to efficiently convert rows to columns in SQL?
  2. How to transpose columns to rows in SQL?
  3. How to select first row in each GROUP BY group?

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science