Menu

SQL Subquery – How to write subqueries using simple examples

SQL Subquery, also known as inner query or nested query, are used to query data from one or more tables and then used in another SQL statement. They are an integral part of SQL since they allow complex database queries to be handled in a simpler way.

A subquery can be used in the SELECT, INSERT, UPDATE, or DELETE statement or inside another subquery. They are usually placed within parentheses and located in the WHERE or HAVING clause.

Let’s take an example to illustrate the concept of subqueries.

Also Read | SQL Tutorial – A Simple and Intuitive Guide

In this lesson we will cover:

  1. Understanding the need for SQL Subquery with an example
  2. Solution Approach
  3. Sub Query – Tougher example 1
  4. Solution using a separate table (without subquery)
  5. Solution using Sub Query Approach
  6. Sub Query – Tougher Example 2
  7. Solve by applying subquery

1. Understanding the need for SQL Subqueries

There are certain situations where you will absolutely need to write a Subquery. This will be clear if you understand with an example.

Consider the following three tables, the contents of which is shown below in code blocks.

  1. Employees
  2. Departments
  3. Sales

Question: Find out the employees who earn more than the overall average salary.

These tables can be created in SQLite DB using this schema.

Let’s print the contents of the tables

The Employees table

SELECT * from Employees;
ID Name Salary DepartmentID
1 Alice 35000 1
2 Bob 40000 2
3 Carol 55000 3
4 David 60000 2
5 Ethan 70000 1

The Departments table

SELECT * from Departments;
ID Name
1 HR
2 Finance
3 Marketing

The Sales table.

SELECT * from Sales;
EmployeeID Product Quantity
1 A 10
1 B 5
2 A 7
2 C 3
3 B 8
3 C 2
4 A 9
4 B 6
5 C 4
5 A 3

Remember, we want to find out the employees who earn more than the overall average salary.

2. Solution Approach

To solve this, we select the name and salary from Employees table. But we need only the employees salaries that are more than the average.

So the plan is to use the WHERE condition, to filter those observations where salary is more than average.

Let’s try it out.

SELECT Name, Salary 
FROM Employees E1
WHERE Salary > AVG(Salary);

But this does NOT work!

This operation errors out giving an operational error – “Misuse of AVG(Salary) function”.

Why?

Since AVG() is a group function, you can only use it in a SELECT statement or in a HAVING clause.

This is quite important to understand.

So, how do we tackle this?

We need to use Subquery.

This can be done using two approaches.

Approach 1: Create query to compute the average and store in a new table. Then, reference it as a subquery.

Approach 2: Direct sub query

Let’s see both approaches.

APPROACH 1:

<--PAUSE AND THINK ABOUT HOW YOU WOULD DO APPROACH 1 -->

Use this link to solve hands-on.

  1. Create a new table that will contain the average salary.
DROP TABLE IF EXISTS AVGSALARY;

CREATE TABLE AVGSALARY AS
SELECT AVG(salary) as AVG_SALARY
FROM EMPLOYEES;

SELECT * FROM AVGSALARY;
AVG_SALARY
52000.0

Now, let’s write the query in the WHERE condition to check if the SALARY > AVG(SALARY).

SELECT Name, Salary 
FROM Employees E1
WHERE Salary > (SELECT AVG_SALARY from AVGSALARY);
Name Salary
Carol 55000
David 60000
Ethan 70000

APPROACH 2: Directly subquery from the employees table

Alternately, you can directly subquery it instead of storing the average value in a separate table.

SELECT Name, Salary 
FROM Employees E1
WHERE Salary > (SELECT AVG(SALARY) from Employees);
Name Salary
Carol 55000
David 60000
Ethan 70000

Hope you got the idea behind the need for Subqueries. Now, let’s look at a slightly tougher example.

In above example, we selected employees whose salary was above overall average. What if you want to select employees whose salary is above their respective department average?

3. Sub Query – Tougher example 1

Task: Find out the employees who earn more than (or equal to) the average salary in their respective department.

To do this, we select the name and salary from Employees table. But we need only the employees salaries that are more than the average in their respective department..

So in WHERE condition, we need to filter those observations where salary is more than the department average and NOT the total average.

Let’s first find the department average.

4. Solution using a separate table (without subquery)

Let’s first try to solve this without a subquery first. Then we will attempt to create the subquery.
That way, you will be clear how subquery works.

In the beginning, it is a good idea to create intermediate temp tables to store the result, just to make sure your query does what it is meant to do. You can create such temp tables, verify the values and drop the tables when they are no longer needed.

Approach 1:
1. Step 1: Create a new table that will hold the average department salaries.
2. Step 2: Merge this table with the Employees table
3. Step 3: Do the filtering.

First, Print the Employees table

SELECT * FROM EMPLOYEES;
ID Name Salary DepartmentID
1 Alice 35000 1
2 Bob 40000 2
3 Carol 55000 3
4 David 60000 2
5 Ethan 70000 1

Compute the Average Salary by Department (in a new table)

DROP TABLE IF EXISTS DEPARTMENTSALARY;

CREATE TABLE DEPARTMENTSALARY AS
SELECT  DEPARTMENTID, 
        AVG(SALARY) as AVG_SALARY
FROM EMPLOYEES
GROUP BY DepartmentID;

Show DepartmentSalary Table

SELECT * FROM DEPARTMENTSALARY;
DepartmentID AVG_SALARY
1 52500.0
2 50000.0
3 55000.0

Merge this with the Employees Table

SELECT A.*, B.AVG_SALARY
FROM EMPLOYEES as A
LEFT JOIN 
DEPARTMENTSALARY as B 
ON A.DepartmentID = B.DepartmentID;
ID Name Salary DepartmentID AVG_SALARY
1 Alice 35000 1 52500.0
2 Bob 40000 2 50000.0
3 Carol 55000 3 55000.0
4 David 60000 2 50000.0
5 Ethan 70000 1 52500.0

Now, Let’s filter out salaries less than dept salaries

SELECT A.NAME, A.SALARY
FROM EMPLOYEES as A
LEFT JOIN 
DEPARTMENTSALARY as B 
ON A.DepartmentID = B.DepartmentID 
WHERE SALARY >= AVG_SALARY;
Name Salary
Carol 55000
David 60000
Ethan 70000

So, that’s the step by step logic.

How do we do it without having to create a new table as seen in 1 Step?

5. Solution using Sub Query Approach

Observe the WHERE condition here. Let’s try to make sense of it.

SELECT E1.Name, E1.Salary 
FROM Employees E1
WHERE Salary >= (
    SELECT AVG(E2.Salary) 
    FROM Employees E2
    WHERE E1.DepartmentID = E2.DepartmentID
);
Name Salary
Carol 55000
David 60000
Ethan 70000

How to understand this query?

We want to select those salaries that are greater than department average. So, we need to apply this in this WHERE clause.

Think of it like this:

The filter condition applied in the WHERE clause gets applied to every row of the table. So, the query gets evaluated for every row to compute the department average.

The subquery SELECT AVG(Salary) FROM Employees GROUP BY DepartmentID will first calculate the average salary for each department. Then the outer query will compare each employee’s salary with the average salary of their respective department and return those who earn more than the average.

6. Sub Query – Tougher Example 2

Task: Find out the employees who sold Product A AND sold more than the average quantity sold by all employees.

How to solve this?

To do this, this query requires information about both product as well as employees. This information does not lie in one table.

So, first step is to “Figure out what tables the data you need resides in.”

We have the following Tables:

  1. Sales (EmployeeID, Product, Quantity)
  2. Departments (ID, Name)
  3. Employees (ID, Name, Salary, DepartmentID)

Sales contains the EmployeeID but the EmployeeName has to come from the Employees table.

<---PAUSE AND THINK HOW TO SOLVE FOR NEXT 120 seconds.--->

SOLVE HandsOn here.

7. Solve by applying subquery

Steps:

  1. First, find the total quantity sold by each employee, for product A.
  2. Then Filter those records greater than avg quantity sold.

First, Compute average total quantity sold by each employee for product A

SELECT  E.Name, 
        S.Product, 
        SUM(S.Quantity) as TotalQuantity
FROM Employees E
JOIN Sales S ON E.ID = S.EmployeeID
WHERE S.PRODUCT = "A"
GROUP BY E.Name, S.Product;
Name Product TotalQuantity
Alice A 10
Bob A 7
David A 9
Ethan A 3

Then, filter those records where the Total Quantity (of product A) sold by an employees is greater than total average (of product A) sold by an employee?

SELECT  E.Name, 
        S.Product, 
        SUM(S.Quantity) as TotalQuantity
FROM Employees E
JOIN Sales S ON E.ID = S.EmployeeID
WHERE S.PRODUCT = "A"
GROUP BY E.Name, S.Product
HAVING TotalQuantity > ("Average of total qty of prdt A sold by an employee")
);

Now, We need to compute “Average of total qty of prdt A sold by an employee”.

So, first compute total qty of product A sold by each employee. Then compute the average.

SELECT SUM(QUANTITY) 
FROM SALES 
WHERE PRODUCT="A" 
GROUP BY EMPLOYEEID;
SUM(QUANTITY)
10
7
9
3

But we want the average of these. So, put a query above this.

SELECT AVG(SUMQTY) as AVG_A
FROM (SELECT SUM(QUANTITY) as SUMQTY 
      FROM SALES 
      WHERE PRODUCT="A"
      GROUP BY EMPLOYEEID)
AVG_A
7.25

Now, apply this subquery inside the main query

SELECT  E.Name, 
        S.Product, 
        SUM(S.Quantity) as TotalQuantity
FROM Employees E
JOIN Sales S ON E.ID = S.EmployeeID
WHERE S.PRODUCT = "A"
GROUP BY E.Name, S.Product
HAVING TotalQuantity > (SELECT AVG(SUMQTY) as AVG_A 
                        FROM (SELECT SUM(QUANTITY) as SUMQTY 
                              FROM SALES 
                              WHERE PRODUCT="A"
                              GROUP BY EMPLOYEEID) as subquery
);
Name Product TotalQuantity
Alice A 10
David A 9

Here, the subquery calculates the average quantity sold of product ‘A’ by all employees. Then the outer query sums up the quantity of product ‘A’ sold by each employee and compares it with the average quantity calculated by the subquery.

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science