Menu

SQL Select Distinct – A Comprehensive Guide

Let’s delve deep into one of SQL’s fundamental operations – the ‘Select Distinct’ statement.

Defining SQL Select Distinct

In SQL, the SELECT DISTINCT statement is used to return only distinct (different) values. It comes in handy when dealing with large databases where data duplications are common. The primary function of the SELECT DISTINCT statement is to eliminate all duplicate rows and present a table of unique rows.

Syntax

Let’s break down the syntax to understand it better

SELECT DISTINCT column1, column2, ...
FROM table_name;

The SELECT DISTINCT statement is followed by the column names where we want to find the distinct values. Then, we specify the table name from which these columns are selected.

Understanding SQL Select Distinct with an Example

Suppose we have a table ‘Orders’ that contains the following data

OrderID CustomerID OrderAmount
------- ---------- -----------
1 101 300
2 102 150
3 103 200
4 101 300
5 102 150
6 104 500

Now, let’s say you want to find all the distinct ‘CustomerID’s from the ‘Orders’ table.

You can use the SELECT DISTINCT statement as follows

Select Distinct on Sigle Column

SELECT DISTINCT CustomerID
FROM Orders;

The output would be

CustomerID
----------
101
102
103
104

You can see that the output only contains unique ‘CustomerID’s.

Select Distinct on Multiple Columns

SELECT DISTINCT can also be used for more than one column. When used in this way, it considers the unique combinations of the specified columns.

SELECT DISTINCT CustomerID, OrderAmount
FROM Orders;

The result would be

CustomerID OrderAmount
---------- -----------
101 300
102 150
103 200
104 500

In the Orders table, even though CustomerID 101 and 102 appear twice, they have the same OrderAmount in their respective rows.

Hence, the SELECT DISTINCT statement considers them as a single distinct entry for both columns.

Limitations of SQL Select Distinct

While the SELECT DISTINCT command is powerful, it does have its limitations. It can lead to performance issues when working with large datasets, as it needs to scan the entire dataset or table to ensure that the returned values are unique.

So, when dealing with large databases, it’s better to limit the usage of the SELECT DISTINCT statement.

Conclusion

SQL’s SELECT DISTINCT statement is a potent tool in data manipulation, helping data analysts filter and organize their data sets more efficiently. While it may be limited by performance issues in large datasets, understanding its functionality and knowing when to use it is crucial for anyone working in the world of data.

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science