SQL Subquery, also known as inner query or nested query, are used to query data from one or more tables and then used in another SQL statement. They are an integral part of SQL since they allow complex database queries to be handled in a simpler way.
A subquery can be used in the SELECT
, INSERT
, UPDATE
, or DELETE
statement or inside another subquery. They are usually placed within parentheses and located in the WHERE
or HAVING
clause.
Let’s take an example to illustrate the concept of subqueries.
Also Read | SQL Tutorial – A Simple and Intuitive Guide
In this lesson we will cover:
- Understanding the need for SQL Subquery with an example
- Solution Approach
- Sub Query – Tougher example 1
- Solution using a separate table (without subquery)
- Solution using Sub Query Approach
- Sub Query – Tougher Example 2
- Solve by applying subquery
1. Understanding the need for SQL Subqueries
There are certain situations where you will absolutely need to write a Subquery. This will be clear if you understand with an example.
Consider the following three tables, the contents of which is shown below in code blocks.
- Employees
- Departments
- Sales
Question: Find out the employees who earn more than the overall average salary.
These tables can be created in SQLite DB using this schema.
Let’s print the contents of the tables
The Employees table
SELECT * from Employees;
ID | Name | Salary | DepartmentID |
---|---|---|---|
1 | Alice | 35000 | 1 |
2 | Bob | 40000 | 2 |
3 | Carol | 55000 | 3 |
4 | David | 60000 | 2 |
5 | Ethan | 70000 | 1 |
The Departments
table
SELECT * from Departments;
ID | Name |
---|---|
1 | HR |
2 | Finance |
3 | Marketing |
The Sales
table.
SELECT * from Sales;
EmployeeID | Product | Quantity |
---|---|---|
1 | A | 10 |
1 | B | 5 |
2 | A | 7 |
2 | C | 3 |
3 | B | 8 |
3 | C | 2 |
4 | A | 9 |
4 | B | 6 |
5 | C | 4 |
5 | A | 3 |
Remember, we want to find out the employees who earn more than the overall average salary.
2. Solution Approach
To solve this, we select the name
and salary
from Employees
table. But we need only the employees salaries that are more than the average.
So the plan is to use the WHERE
condition, to filter those observations where salary is more than average.
Let’s try it out.
SELECT Name, Salary
FROM Employees E1
WHERE Salary > AVG(Salary);
But this does NOT work!
This operation errors out giving an operational error – “Misuse of AVG(Salary) function”.
Why?
Since AVG() is a group function, you can only use it in a SELECT
statement or in a HAVING
clause.
This is quite important to understand.
So, how do we tackle this?
We need to use Subquery.
This can be done using two approaches.
Approach 1: Create query to compute the average and store in a new table. Then, reference it as a subquery.
Approach 2: Direct sub query
Let’s see both approaches.
APPROACH 1:
<--PAUSE AND THINK ABOUT HOW YOU WOULD DO APPROACH 1 -->
Use this link to solve hands-on.
- Create a new table that will contain the average salary.
DROP TABLE IF EXISTS AVGSALARY;
CREATE TABLE AVGSALARY AS
SELECT AVG(salary) as AVG_SALARY
FROM EMPLOYEES;
SELECT * FROM AVGSALARY;
AVG_SALARY |
---|
52000.0 |
Now, let’s write the query in the WHERE condition to check if the SALARY > AVG(SALARY)
.
SELECT Name, Salary
FROM Employees E1
WHERE Salary > (SELECT AVG_SALARY from AVGSALARY);
Name | Salary |
---|---|
Carol | 55000 |
David | 60000 |
Ethan | 70000 |
APPROACH 2: Directly subquery from the employees table
Alternately, you can directly subquery it instead of storing the average value in a separate table.
SELECT Name, Salary
FROM Employees E1
WHERE Salary > (SELECT AVG(SALARY) from Employees);
Name | Salary |
---|---|
Carol | 55000 |
David | 60000 |
Ethan | 70000 |
Hope you got the idea behind the need for Subqueries. Now, let’s look at a slightly tougher example.
In above example, we selected employees whose salary was above overall average. What if you want to select employees whose salary is above their respective department average?
3. Sub Query – Tougher example 1
Task: Find out the employees who earn more than (or equal to) the average salary in their respective department.
To do this, we select the name and salary from Employees table. But we need only the employees salaries that are more than the average in their respective department..
So in WHERE condition, we need to filter those observations where salary is more than the department average and NOT the total average.
Let’s first find the department average.
4. Solution using a separate table (without subquery)
Let’s first try to solve this without a subquery first. Then we will attempt to create the subquery.
That way, you will be clear how subquery works.
In the beginning, it is a good idea to create intermediate temp tables to store the result, just to make sure your query does what it is meant to do. You can create such temp tables, verify the values and drop the tables when they are no longer needed.
Approach 1:
1. Step 1: Create a new table that will hold the average department salaries.
2. Step 2: Merge this table with the Employees table
3. Step 3: Do the filtering.
First, Print the Employees table
SELECT * FROM EMPLOYEES;
ID | Name | Salary | DepartmentID |
---|---|---|---|
1 | Alice | 35000 | 1 |
2 | Bob | 40000 | 2 |
3 | Carol | 55000 | 3 |
4 | David | 60000 | 2 |
5 | Ethan | 70000 | 1 |
Compute the Average Salary by Department (in a new table)
DROP TABLE IF EXISTS DEPARTMENTSALARY;
CREATE TABLE DEPARTMENTSALARY AS
SELECT DEPARTMENTID,
AVG(SALARY) as AVG_SALARY
FROM EMPLOYEES
GROUP BY DepartmentID;
Show DepartmentSalary Table
SELECT * FROM DEPARTMENTSALARY;
DepartmentID | AVG_SALARY |
---|---|
1 | 52500.0 |
2 | 50000.0 |
3 | 55000.0 |
Merge this with the Employees Table
SELECT A.*, B.AVG_SALARY
FROM EMPLOYEES as A
LEFT JOIN
DEPARTMENTSALARY as B
ON A.DepartmentID = B.DepartmentID;
ID | Name | Salary | DepartmentID | AVG_SALARY |
---|---|---|---|---|
1 | Alice | 35000 | 1 | 52500.0 |
2 | Bob | 40000 | 2 | 50000.0 |
3 | Carol | 55000 | 3 | 55000.0 |
4 | David | 60000 | 2 | 50000.0 |
5 | Ethan | 70000 | 1 | 52500.0 |
Now, Let’s filter out salaries less than dept salaries
SELECT A.NAME, A.SALARY
FROM EMPLOYEES as A
LEFT JOIN
DEPARTMENTSALARY as B
ON A.DepartmentID = B.DepartmentID
WHERE SALARY >= AVG_SALARY;
Name | Salary |
---|---|
Carol | 55000 |
David | 60000 |
Ethan | 70000 |
So, that’s the step by step logic.
How do we do it without having to create a new table as seen in 1 Step?
5. Solution using Sub Query Approach
Observe the WHERE
condition here. Let’s try to make sense of it.
SELECT E1.Name, E1.Salary
FROM Employees E1
WHERE Salary >= (
SELECT AVG(E2.Salary)
FROM Employees E2
WHERE E1.DepartmentID = E2.DepartmentID
);
Name | Salary |
---|---|
Carol | 55000 |
David | 60000 |
Ethan | 70000 |
How to understand this query?
We want to select those salaries that are greater than department average. So, we need to apply this in this WHERE clause.
Think of it like this:
The filter condition applied in the WHERE clause gets applied to every row of the table. So, the query gets evaluated for every row to compute the department average.
The subquery SELECT AVG(Salary) FROM Employees GROUP BY DepartmentID
will first calculate the average salary for each department. Then the outer query will compare each employee’s salary with the average salary of their respective department and return those who earn more than the average.
6. Sub Query – Tougher Example 2
Task: Find out the employees who sold Product A AND sold more than the average quantity sold by all employees.
How to solve this?
To do this, this query requires information about both product as well as employees. This information does not lie in one table.
So, first step is to “Figure out what tables the data you need resides in.”
We have the following Tables:
- Sales (EmployeeID, Product, Quantity)
- Departments (ID, Name)
- Employees (ID, Name, Salary, DepartmentID)
Sales contains the EmployeeID
but the EmployeeName
has to come from the Employees
table.
<---PAUSE AND THINK HOW TO SOLVE FOR NEXT 120 seconds.--->
SOLVE HandsOn here.
7. Solve by applying subquery
Steps:
- First, find the total quantity sold by each employee, for product A.
- Then Filter those records greater than avg quantity sold.
First, Compute average total quantity sold by each employee for product A
SELECT E.Name,
S.Product,
SUM(S.Quantity) as TotalQuantity
FROM Employees E
JOIN Sales S ON E.ID = S.EmployeeID
WHERE S.PRODUCT = "A"
GROUP BY E.Name, S.Product;
Name | Product | TotalQuantity |
---|---|---|
Alice | A | 10 |
Bob | A | 7 |
David | A | 9 |
Ethan | A | 3 |
Then, filter those records where the Total Quantity (of product A) sold by an employees is greater than total average (of product A) sold by an employee?
SELECT E.Name,
S.Product,
SUM(S.Quantity) as TotalQuantity
FROM Employees E
JOIN Sales S ON E.ID = S.EmployeeID
WHERE S.PRODUCT = "A"
GROUP BY E.Name, S.Product
HAVING TotalQuantity > ("Average of total qty of prdt A sold by an employee")
);
Now, We need to compute “Average of total qty of prdt A sold by an employee”.
So, first compute total qty of product A sold by each employee. Then compute the average.
SELECT SUM(QUANTITY)
FROM SALES
WHERE PRODUCT="A"
GROUP BY EMPLOYEEID;
SUM(QUANTITY) |
---|
10 |
7 |
9 |
3 |
But we want the average of these. So, put a query above this.
SELECT AVG(SUMQTY) as AVG_A
FROM (SELECT SUM(QUANTITY) as SUMQTY
FROM SALES
WHERE PRODUCT="A"
GROUP BY EMPLOYEEID)
AVG_A |
---|
7.25 |
Now, apply this subquery inside the main query
SELECT E.Name,
S.Product,
SUM(S.Quantity) as TotalQuantity
FROM Employees E
JOIN Sales S ON E.ID = S.EmployeeID
WHERE S.PRODUCT = "A"
GROUP BY E.Name, S.Product
HAVING TotalQuantity > (SELECT AVG(SUMQTY) as AVG_A
FROM (SELECT SUM(QUANTITY) as SUMQTY
FROM SALES
WHERE PRODUCT="A"
GROUP BY EMPLOYEEID) as subquery
);
Name | Product | TotalQuantity |
---|---|---|
Alice | A | 10 |
David | A | 9 |
Here, the subquery calculates the average quantity sold of product ‘A’ by all employees. Then the outer query sums up the quantity of product ‘A’ sold by each employee and compares it with the average quantity calculated by the subquery.