Problem
Let’s see how to find duplicate values in an SQL table.
Input
id | name | enrollment_number |
---|---|---|
1 | Alice | ENR123 |
2 | Bob | ENR124 |
3 | Charlie | ENR125 |
4 | David | ENR123 |
5 | Edward | ENR126 |
6 | Frank | ENR124 |
Try Hands-On: Fiddle
Create Source tables: Code Gist
Desired Output
enrollment_number | count |
---|---|
ENR123 | 2 |
ENR124 | 2 |
Solution 1:
Using GROUP BY
SELECT enrollment_number, COUNT(enrollment_number) as count
FROM students
GROUP BY enrollment_number
HAVING count >1;
Explanation:
GROUP BY
enrollment_number: This will create groups of rows based on unique enrollment_number values.-
HAVING count >
1: This is a post-grouping filter that allows us to only consider those groups where the count of enrollment_number is more than 1. Hence, it filters the groups to show only the duplicates. COUNT(enrollment_numbe
r) as count: This will count the number of occurrences of each enrollment_number in the grouped result.
Solution 2:
Using correlated subquery
SELECT name, enrollment_number
FROM students
WHERE enrollment_number IN (
SELECT enrollment_number
FROM students
GROUP BY enrollment_number
HAVING COUNT(enrollment_number) > 1
);
Result:
name | enrollment_number |
---|---|
Alice | ENR123 |
Bob | ENR124 |
David | ENR123 |
Frank | ENR124 |
Explanation:
The inner query (subquery) selects all the enrollment_numbers that have a count greater than 1, thus identifying duplicates.
The outer query then filters the main table, students, to show only rows where the enrollment_number matches one from the list identified as duplicates in the subquery.