Problem
How to randomly select rows quickly from a large table in MySQL?
Input
employee_id | first_name | last_name | department | salary |
---|---|---|---|---|
1 | John | Doe | HR | 50000.00 |
2 | Jane | Smith | Engineering | 60000.00 |
3 | Alice | Johnson | Finance | 55000.00 |
4 | Bob | Brown | Sales | 52000.00 |
5 | Eva | Williams | Marketing | 48000.00 |
6 | Chris | Davis | Engineering | 62000.00 |
7 | Sarah | Wilson | Finance | 56000.00 |
8 | Mike | Jones | HR | 51000.00 |
9 | Linda | Martinez | Sales | 53000.00 |
10 | Tom | Moore | Marketing | 49000.00 |
11 | b0a93 | c51ea | Engineering | 59930.00 |
12 | 1a7d6 | 57e07 | Engineering | 56147.00 |
13 | 4d366 | 7775b | Engineering | 54771.00 |
14 | 6fb08 | b77e7 | HR | 59799.00 |
15 | 4a00a | 88ec8 | Finance | 56144.00 |
16 | 77288 | d1c5c | Marketing | 50731.00 |
17 | 3fdca | e51ac | Sales | 58556.00 |
18 | 73313 | 37117 | Engineering | 52198.00 |
19 | d90a8 | f6b7c | Finance | 53655.00 |
20 | 8ec6a | 833c8 | Finance | 53730.00 |
Try Hands-On: Fiddle
Create Input Table: Gist
Desired Output
employee_id | first_name | last_name | department | salary |
---|---|---|---|---|
9 | Linda | Martinez | Sales | 53000.00 |
19 | 9bc59 | 3de01 | Finance | 55591.00 |
11 | ff441 | bf511 | Finance | 55456.00 |
2 | Jane | Smith | Engineering | 60000.00 |
3 | Alice | Johnson | Finance | 55000.00 |
Solution 1:
Using generated Random number
SELECT * FROM employees
ORDER BY RAND()
LIMIT 5;
Explanation:
SELECT * FROM employees: This part of the query selects all columns from the employees table.
ORDER BY RAND(): This clause orders the rows randomly. RAND() generates a random number between 0 and 1 for each row, effectively shuffling the rows.
LIMIT 5: This limits the result set to the first 5 rows. You can change the number to the desired number of randomly selected rows.
Solution 2:
More efficient / faster approach.
Using generated Random number and WHERE
SELECT * FROM employees
WHERE RAND() <= 0.1
ORDER BY RAND()
LIMIT5;
Explanation:
- SELECT * FROM employees: This part of the query selects all columns from the employees table.
-
WHERE RAND() <= 0.1: This condition filters rows with a random number less than or equal to 0.1. By adjusting the value 0.1, you can control the probability of selecting rows. For example, setting it to 0.1 means there’s a 10% chance of selecting each row.
-
ORDER BY RAND(): This clause still orders the filtered rows randomly to ensure randomness within the selected subset.
-
LIMIT 5: This limits the result set to the first 5 rows from the filtered and randomly ordered subset.
Solution 3:
SELECT * FROM employees
WHERE RAND() <= (
SELECT (5 / COUNT(*)) FROM employees
)
ORDER BY RAND()
LIMIT 5;
Explanation:
WHERE RAND() <= (…): Here, we use a subquery to calculate a random selection probability. (5 / COUNT(*)) calculates the probability based on the number of rows in the table. In this case, it selects approximately 5 rows out of the total count.
You can adjust the 5 to control the desired number of rows to select.