Menu

SQL Tutorial – A Simple and Intuitive Guide to the Structured Query Language

SQL, short for Structured Query Language is a programming language used to communicate with databases and do various types of Data wrangling operations. This is an essential skillset for any type of Data related jobs. In this tutorial, let’s get started with the basics of SQL.

1. Why you should learn SQL?

First, let’s understand what is SQL.

SQL, pronounced “see-quel” or “S-Q-L”, is a programming language specifically designed for managing databases.

SQL is used to communicate with databases to retrieve and manipulate data.

Application or websites use databases to store and access data, like user information, transaction data, product details, etc. So, if you have to create such applications, you will need to know at least basic SQL.

It is particularly effective for handling structured data, i.e., data incorporating relations among entities and variables.

The History of SQL

SQL was first developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s.

It was initially called SEQUEL (Structured English Query Language) and was designed to manipulate and retrieve data stored in IBM’s original quasi-relational database management system, System R, which a group at IBM San Jose Research Laboratory had developed.

Over time, SQL became a standard tool for managing data, and it has been maintained and standardized by the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) since 1986.

It is still effective and in-practice to this day, even modern databases adopting SQL at it’s core.

Why Learn SQL?

It is the most popular, largely accepted way to query data from large databases and do all sorts of data wrangling. It is a basic knowledge requirement expected out of Data professionals.

SQL and relational databases are everywhere!

Helpful in pursuing a Data Science Career

Almost every data related technology supports SQL in some form

Handle large amounts of structured data

Easy to learn. Get good with practice.

Learning SQL (Structured Query Language) can be advantageous for a number of reasons:

1) Data Management: SQL is a standard language for managing data held in a relational database management system (RDBMS) or for stream processing in a relational data stream management system (RDSMS).

2) In Demand Skill: Makes you more marketable for jobs. From data analysts and data scientists to software engineers and database administrators, many jobs require SQL skills.

3) Data Analysis: SQL is designed not just to manage data, but to retrieve and analyze it. You can use SQL to perform complex queries that help you understand data trends, patterns, and insights.

4) Interoperability: Almost every RDBMS (like MySQL, Oracle, SQL Server, SQLite, Postgres) uses SQL, even with some differences in dialect. This means that learning SQL gives you skills that you can use across various database systems.

5) Integrations with Other Languages: SQL integrates with other programming languages such as Python, Java, and R.

6) Handling Large Databases: SQL can handle databases of all sizes, from a few hundred entries to millions or even billions. This makes it a valuable for companies dealing with big data.

7) Long-term Relevance: SQL has been around since the 1970s and is still going strong. The importance of SQL in data management ensures its relevance for many years to come.

8) Ease of Learning: While SQL has depth and complexity. It has a short learning curve but more you practice the better you get at it.

SQL is a powerful and robust tool for extracting relevant and useful data from a large dataset.

While SQL has traditionally been the specialty of highly-trained data analysts and programmers, it’s finding greater acceptance among non-technical folks.

2. Types of SQL servers

SQL servers generally refer to a category of software that provides relational database management system (RDBMS) services, which are often used in conjunction with the SQL (Structured Query Language) programming language.

Here are several types of SQL servers provided by different software vendors:

  1. Microsoft SQL Server: A widely used commercial SQL server from Microsoft.

  2. MySQL: An open-source RDBMS popular for web applications, now owned by Oracle.

  3. PostgreSQL: An open-source object-relational database system known for its robustness and complexity.

  4. Oracle Database: A comprehensive and powerful commercial RDBMS from Oracle Corporation.

  5. MariaDB: An open-source RDBMS, and a community-developed fork of MySQL.

  6. IBM DB2: An enterprise-level, high-performance DBMS from IBM.

  7. Amazon RDS: A scalable and managed RDBMS service from Amazon Web Services.

  8. Google Cloud SQL: A fully-managed RDBMS service offered by Google Cloud.

  9. Azure SQL Database: A fully managed RDBMS service provided by Microsoft’s Azure Cloud platform.

  10. SQLite: A lightweight, disk-based database with an embedded processing model.SQLite is a C library that provides a lightweight disk-based database. It doesn’t have a separate server process like most other SQL servers.

  11. Apache HIVE: Manage petabytes of data residing in distributed storage using SQL.

The sqlite3 module in Python provides an SQL interface for interacting with SQLite databases. This module is an embedded SQL database engine and does not have a separate server process.

4. What is a Database and Table?

Before you start learning SQL, it is very important to understand what is Database and Table?

1) Database: Imagine a library. It’s a place where lots of information is stored. This information is organized in a way that you can find and access what you’re looking for.

A database is the equivalent of this library in the digital world. It’s a structured set of data. So, a database might be used to store information for a website, a business, a game, and so on. It helps us store, manage, and retrieve that information efficiently.

2) Tables: Now within this library, books are organized into different sections or shelves based on categories like ‘Science Fiction’, ‘History’, ‘Biography’ and so forth. Tables are just like these sections in our library analogy.

In a database, a table is a set of data elements (values) that is organized using a model of vertical columns (which are identified by their name) and horizontal rows, the cell being the unit where a row and column intersect.

A table has a specified number of columns, but can have any number of rows.

SQL Syntax example

Typically we write SQL Commands in CAPS. However, SQL is NOT case sensitive. So, SELECT is same as select in SQL.

SELECT ProductID, CustomerID, COUNT(*)
FROM Orders
WHERE Quantity > 0
GROUP BY ProductID, CustomerID
HAVING COUNT(*) > 5
ORDER BY ProductID ASC, CustomerID DESC;

Explanation:

  • SELECT: Retrieve specific columns or expressions from the table.
  • FROM: Specify the table from which to retrieve data.
  • WHERE: Filter the data based on specific conditions.
  • GROUP BY: Group the result set based on one or more columns.
  • HAVING: Filter the grouped data based on conditions.
  • ORDER BY: Sort the result set based on one or more columns.

5. Setup Python Environment for SQL

  1. Install Anaconda
  2. Or Use Google Colab

Check instuctions in respective video lessons in Python Course

6. Import necessary libraries and create sample data

Here is the plan:

We are going to use pandas to load a CSV file as a Pandas Dataframe. Then we will load this dataframe to SQL Database. From then on, we will do all the work / data processing in SQL.

Note:

Typically, in workplace scenarios, this step is done for you and not required. Your org will typically already have the databases with tables. Nevertheless, let’s see how to create one if not already given so you get the full picture.

#!pip install pandas
import pandas as pd
sample_df = pd.read_csv('DataSQL.csv')
sample_df.head()
Current_id Dep_Var Logit_Var Var_1 Var_2 Var_3 Var_4 Var_5
0 301897853 16428 0 -113 40000 436 105 13
1 301934672 8678 1 -120 35000 446 113 15
2 301902540 25561 1 -99 36000 456 132 14
3 301864713 8528 0 -85 21600 408 142 14
4 301937240 10128 0 -118 38000 396 139 13

Basic outline to follow while learning SQL in a structured manner

  1. Create a Database
  2. Create and Load Table in Database
  3. Create New Table from existing table
  4. Copy data from one table and insert it into already exisitng table
  5. Get list of tables in SQL DB
  6. Drop Tables in DB
  7. Delete rows in a existing Dataset using where condition
  8. Update rows/columns in a existing Dataset
  9. Select data / required vaiables from a dataset
  10. SQL Comments
  11. Limit the number of rows returned from a query
  12. Identify and Delete duplicates in a dataset
  13. Filter rows in SQL dataset
  14. Sort the data in ascending or descending order
  15. Aggregate identical data into groups
  16. Filter rows after the aggregated
  17. SQL Joins
    • 17(a) – Inner Join
    • 17(b) – Full Join
    • 17(c) – Left Join
    • 17(d) – RIght Join

1. Create DB and Run SQL in Jupyter Environment

Database in SQL is used to manage and manipulate the data.

In this Database, data is organized into tables which are related to each other based on common keys or concepts; hence the term “relational database”.

Each table consists of columns (fields) and rows (records). A unique key identifies each record in a table, and relationships between tables are created through these keys.

Python ships with a default SQL database called ‘sqlite’. Let’s connect to the db that comes with this course called jupyter_sql_tutorial.db.

STEP 1: CONNECT or CREATE

# !pip install db-sqlite3
import sqlite3

# Use sqlite3 library to create a database connection
Test_db = sqlite3.connect('jupyter_sql_tutorial.db')

If the database does not exist, it will create a new one.

# Create new database if not exist
Test_db2 = sqlite3.connect('new_database.db')

If you check the work directory, it should contain a new file named new_database.db.

Important: Install ipython-sql for the %sql function to work

# Common error: https://stackoverflow.com/questions/37149748/ipython-notebook-and-sql-importerror-no-module-named-sql-when-running-load
# !pip install ipython-sql

STEP 2: LOAD DATABASE and ATTACH

In database management systems, the “ATTACH DATABASE” statement is used to connect or link an external database to the current database session.

%load_ext sql
%sql sqlite:///jupyter_sql_tutorial.db

Important note, if you want to run this code in Jupyter notebook:

  • Add ‘%%sql’ in the beginning of each cell so Jupyter knows the cell contains SQL code.
  • Do ‘Attach Database’ so we can refer ‘jupyter_sql_tutorial.db’ as ‘test_db’.
ATTACH DATABASE 'jupyter_sql_tutorial.db' AS Test_db;

STEP 3: Save Pandas DataFrame to SQL Database (Test_db)

# Load the dataset if not already loaded.
try:
    sample_df.to_sql('sample_df', Test_db)
except:
    pass

Note: If above statement gives error that the table sample_df already exists, then it means the table has already been loaded, so no need to re-run.

View / Print sample_df Loaded into Test_db

Now, let’s read the data from the Sample_df table stored in Test_db

Notice the syntax.

SELECT *
FROM Test_db.Sample_df

This prints the entire table. Not showing here because of long output.

The above output was too long, and can easily run out of memory for large tables.

So, let’s get only 5 rows.

SELECT *
FROM Test_db.Sample_df
LIMIT 5;
index Current_id Dep_Var Logit_Var Var_1 Var_2 Var_3 Var_4 Var_5
0 301897853 16428 0 -113 40000 436 105 13
1 301934672 8678 1 -120 35000 446 113 15
2 301902540 25561 1 -99 36000 456 132 14
3 301864713 8528 0 -85 21600 408 142 14
4 301937240 10128 0 -118 38000 396 139 13

How to count the number of records in the table?

SELECT count(*)
FROM Test_db.Sample_df;

14989

How to count the columns?

That is not the right way to think about it. Let me explain.

In Python, you normally get the count of columns to reference the column by column index or do some iterations.

Whereas in SQL, we reference the columns by the respective column names. All the operations we do is vectorised (applied on the entire data).

Nevertheless, SQL does not prevent you from finding the count of columns, but how you compute it depends on what RDBMS you work on.

Here are couple of ways to do it: MySQL, SQLite

SELECT COUNT(*)
FROM pragma_table_info('Sample_df');
COUNT(*)
9

2. Data wrangling in SQL vs Python

A key difference in Data Wrangling between SQL and Python is: In Python, you can create your own functions, iterate in a customized way by writing For Loops. But in SQL, we accomplish all tasks by the limited set of predefined SQL commands.

3. Setup MySQL Workbench

Helpful Article:
1. How to install MySQL on Windows
2. How to install MySQL on MacOS
3. Official Guide: Windows, MacOS

With time, these article links may change. To get the latest just google search “Install MySQL in Windows” (or Mac) and look into the latest article.

Steps

  1. Download MYSQL installer: https://dev.mysql.com/downloads/installer/
  2. Once installation completes select the ‘Full’ setup option.

  1. In ‘Type and Networking’ screen, keep it as ‘Development Computer’ if you are using a personal PC.
    Keep the default port or you can give your own.

  1. Continue default method until it asks you to give root account password.

  1. Apply configuration and Finish.

Done! The next step is to connect to the server. Type the root account’s password and click Check

  1. In MySQL Router configuration, uncheck this box and click next.

  1. Once everything is set, you will be greeted with the workbench screen. Choose the connection to the server you had created and log into it.

  1. You can now start writing your queries.

Information on Sakila Database we use for Exercises

The sakila database is very popular, basic set of tables that comes preinstalled by default when you install MySQL server in the workbench. If it’s not present in anycase, you can find out more about the database and set it up using the following links:

  1. Structure of DB: https://dev.mysql.com/doc/sakila/en/sakila-structure.html
  2. Installation: https://dev.mysql.com/doc/sakila/en/sakila-installation.html
  3. Download link: https://dev.mysql.com/doc/index-other.html
  4. Alternate Download Link: https://www.sqliz.com/sakila/installation/

2. Create Table manually in DB

Tables are an essential component of databases as they hold all the data. When a table is created, the name of the table, names of columns, and the type of data that can be stored in the columns are defined.

The CREATE TABLE statement in SQL is used to create a new table in a database with “0 Rows”.

CREATE TABLE Students (
  StudentID int PRIMARY KEY,
  FirstName varchar(255),
  LastName varchar(255),
  Age int,
  Grade int
);

SELECT * FROM Students
StudentID FirstName LastName Age Grade

Add rows manually to Students table

INSERT INTO Students (StudentID, FirstName, LastName, Age, Grade)
VALUES (1, 'John', 'Doe', 15, 10),
       (2, 'Jane', 'Doe', 14, 9),
       (3, 'Jim', 'Beam', 16, 10),
       (4, 'Jack', 'Daniels', 15, 9);

SELECT * FROM Students
StudentID FirstName LastName Age Grade
1 John Doe 15 10
2 Jane Doe 14 9
3 Jim Beam 16 10
4 Jack Daniels 15 9

Get Table Variables types

PRAGMA table_info(Students);
cid name type notnull dflt_value pk
0 StudentID INT 0 None 1
1 FirstName varchar(255) 0 None 0
2 LastName varchar(255) 0 None 0
3 Age INT 0 None 0
4 Grade INT 0 None 0

PRAGMA is a sqlite specific command. If you are working on other databases, say MySQL, use DESCRIBE command.

DESCRIBE Students

SQL Variables Types:

Creating a table in SQL requires defining not only the column names but also the data types for each column. The most common types of data used in SQL are:

A) INT: For integer numbers

B) FLOAT or DOUBLE: For floating point numbers

C) DATE: For date values

D) TIMESTAMP: For date and time values

E) CHAR(size): For fixed length strings (size = number of characters)

F) VARCHAR(size): For variable length strings (size = maximum number of characters)

G) TEXT: For storing large amount of text

I) BIT: For storing either true or false logical values.

J) BLOB

Here is a sample SQL command to create a table with various types of variables:

CREATE TABLE Employee (
    EmployeeID INT PRIMARY KEY,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    BirthDate DATE,
    HireDate TIMESTAMP,
    Salary DOUBLE,
    Bio TEXT,
    Photo BLOB,
    IsActive BIT
);

2b. Exercise: Create Orders table from raw data.

Try the following in Jupyter or MySQL Workbench.

  1. Create a database ‘homework’.

  2. Create an orders table in ‘homework’ containing the following fields:

    • OrderID
    • OrderDate
    • Qty
    • Customer ID
    • Product category
    • Price
  3. Then insert the following entries.
    • 1, 2023-01-01, 2, 10, “Electronics”, 110
    • 2, 2023-01-01, 2, 11, “Home”, 10
    • 3, 2023-01-01, 2, 11, “Kitchen”, 80
    • 4, 2023-01-01, 2, 12, “Electronics”, 11
    • 5, 2023-01-01, 2, 12, “Outdoor”, 19
    • 6, 2023-01-01, 2, 13, “Outdoor”, 217
    • 7, 2023-01-01, 2, 13, “Home”, 8

3. Create New Table from existing table

The “CREATE TABLE” statement copies data from one table and inserts it into a new table.

CREATE TABLE Students_test AS 
SELECT * FROM Students
WHERE Grade < 10;

SELECT * FROM Students_test
StudentID FirstName LastName Age Grade
2 Jane Doe 14 9
4 Jack Daniels 15 9

4. Show Tables from existing Database

Different RDBMS (like SQLite, MySQL.. ) have different ways of doing this. The two most common methods are shown below.

For SQLite, this info is stored in an in-built database called sqlite_master. You can query that table using the following command.

SELECT name FROM sqlite_master WHERE type='table';
name
sample_df
Students
Employee
Students_test

In MYSQL databases, you can use the SHOW TABLES command. This is very common in most RDBMS.

SHOW TABLES FROM {DB_NAME};

Ref: https://dev.mysql.com/doc/refman/8.0/en/show-tables.html

5. Insert Into Select: copy data from one table to another table

Insert Into Select is a versatile feature that can be utilized to copy data from one table and insert it into already exisitng table.

It will not copy to a new table.

INSERT INTO Students_test
SELECT *
FROM Students;

SELECT * FROM Students_test
StudentID FirstName LastName Age Grade
2 Jane Doe 14 9
4 Jack Daniels 15 9
1 John Doe 15 10
2 Jane Doe 14 9
3 Jim Beam 16 10
4 Jack Daniels 15 9

Create one more

INSERT INTO Students_test2
SELECT *
FROM Students;

SELECT * FROM Students_test2
StudentID FirstName LastName Age Grade
1 John Doe 15 10
2 Jane Doe 14 9
3 Jim Beam 16 10
4 Jack Daniels 15 9

This did not work.

Why?

Because Student_test2 table did not exist. So, you need to first create it.

CREATE TABLE Students_test2 AS 
SELECT * FROM Students
WHERE Grade < 10;

SELECT * FROM Students_test2
 * sqlite:///jupyter_sql_tutorial.db
(sqlite3.OperationalError) table Students_test2 already exists
[SQL: CREATE TABLE Students_test2 AS 
SELECT * FROM Students
WHERE Grade < 10;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

Check if new table was created

SELECT name 
FROM sqlite_master 
WHERE type='table';
name
sample_df
Students
Employee
Students_test
Students_test2

6. SQL DELETE: Delete rows in a existing Dataset using where condition

DELETE command can be used to delete a single record, multiple records, or all records from a table.

DELETE FROM Students_test2
WHERE Grade = 9;

SELECT * FROM Students_test2
StudentID FirstName LastName Age Grade
1 John Doe 15 10
3 Jim Beam 16 10

You can give more complex conditions as well.

DELETE FROM Students_test2
WHERE Grade = 9 AND AGE = 15;

SELECT * FROM Students_test2
StudentID FirstName LastName Age Grade
1 John Doe 15 10
3 Jim Beam 16 10

7. Delete a Table

You can delete / drop an entire table in a single DROP TABLE command. So, be absolutely sure before running this.

The DROP TABLE command in SQL is a Data Definition Language (DDL) statement used to remove a table definition along with all the data, indexes, triggers, constraints, and permission specifications for that table. Essentially, the command completely eliminates the table from the database.

It is important to exercise caution when using this command because once a table is dropped, all the information in the table is lost and cannot be recovered (unless you have a backup).

DROP TABLE Students_test2;

Check the DROP TABLE Command after dropping table (Students_test2) in Database

It will give an error, because, it does not exist anymore.

DROP TABLE Students_test2;
 * sqlite:///jupyter_sql_tutorial.db
(sqlite3.OperationalError) no such table: Students_test2
[SQL: DROP TABLE Students_test2;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

Check tables in the Database

SELECT name 
FROM sqlite_master 
WHERE type='table';
name
sample_df
Students
Employee
Students_test

8. SQL Update : Update rows/columns in a existing Dataset

The UPDATE statement is used to modify existing records in a table

UPDATE Students
SET Grade = Grade * 10, 
    Age = Age + 2
WHERE FirstName = 'Jane';

SELECT * FROM Students
StudentID FirstName LastName Age Grade
1 John Doe 15 10
2 Jane Doe 16 90
3 Jim Beam 16 10
4 Jack Daniels 15 9

8b. Exercise 2: Updating Table

Try the following in MySQL Workbench on the Sakila Database

  1. Set to use the database as Sakila.
  2. Update Sakila.FILM to increase all values ‘Rental_Duration’ by 5

Note:

  • The Sakila db is present as a default database in MySQL workbench.
    If not present, follow the instructions in the ‘Setup MySQL WorkBench’ lesson above to set it up.

  • Basically, you need to run the 2 SQL scripts inside ‘Sakila-db’ directory: sakila-schema.sql, then, sakila-data.sql.

9. SQL Select : Select data / required vaiables from a dataset

The SELECT statement is used to select data from a database.

Select * is used to select all colums in the table

SELECT *
FROM Students;
StudentID FirstName LastName Age Grade
1 John Doe 15 10
2 Jane Doe 16 90
3 Jim Beam 16 10
4 Jack Daniels 15 9

Select only defined columns SELECT FirstName, LastName

SELECT FirstName, LastName
FROM Students;
FirstName LastName
John Doe
Jane Doe
Jim Beam
Jack Daniels

Select Only ‘Jane’ and ‘Jack’

SELECT FirstName, LastName
FROM Students
WHERE FirstName in ("Jane", "Jack");
FirstName LastName
Jane Doe
Jack Daniels

Select all except ‘Jane’ and ‘Jack’

SELECT FirstName, LastName
FROM Students
WHERE FirstName not in ("Jane", "Jack");
FirstName LastName
John Doe
Jim Beam

Select all where LastName startd with “D”

SELECT FirstName, LastName
FROM Students
WHERE LastName LIKE ("D%");
FirstName LastName
John Doe
Jane Doe
Jack Daniels

More on such patterns in an upcoming section.

10. SQL Comments

Comments are lines of text in your SQL code that the SQL interpreter or compiler ignores

SQL offers two types of comment syntax:

  1. Single-Line Comments

  2. Multi-Line Comments

-- This is a single-line comment

/*
  This is a multi-line comment.
  The SELECT statement below retrieves all records from the Customers table.
*/

SELECT * FROM Students;
StudentID FirstName LastName Age Grade
1 John Doe 15 10
2 Jane Doe 16 90
3 Jim Beam 16 10
4 Jack Daniels 15 9

11. Limit the number of rows returned from a query

The SELECT TOP clause is a SQL command used to limit the number of rows returned from a query. This is particularly useful when you’re dealing with large databases, and you want a subset of records.

Fetching Top N Records

If you want to select the top 3 orders

The LIMIT clause is used to limit the number of records returned by the query. For example, if you set LIMIT 3, it will return only the first 3 records from the result set.

First, let’s print all rows

SELECT * 
FROM Students_test;

Only 3 rows

SELECT * 
FROM Students_test LIMIT 3;
StudentID FirstName LastName Age Grade
2 Jane Doe 14 9
4 Jack Daniels 15 9
1 John Doe 15 10

The OFFSET clause is used to specify the starting point for the rows to be returned. If you set OFFSET 2, it means that the first 2 rows will be skipped and the query will start returning records from the 3rd row.

So, when you combine LIMIT 3 OFFSET 2 in a SQL query, it means that the SQL statement will skip the first 2 records and then return the next 3 records.

SELECT * 
FROM Students_test 
LIMIT 3 OFFSET 2;
StudentID FirstName LastName Age Grade
1 John Doe 15 10
2 Jane Doe 14 9
3 Jim Beam 16 10

12. SELECT DISTINCT : Identify and Delete duplicates in a dataset

The primary function of the SELECT DISTINCT statement is to eliminate all duplicate rows and present a table of unique rows.

Sample data for understanding SELECT DISTINCT

SELECT * 
FROM Students_test 
StudentID FirstName LastName Age Grade
2 Jane Doe 14 9
4 Jack Daniels 15 9
1 John Doe 15 10
2 Jane Doe 14 9
3 Jim Beam 16 10
4 Jack Daniels 15 9

SELECT DISTINCT * Get distinct of entire row

SELECT DISTINCT *
FROM Students_test;
StudentID FirstName LastName Age Grade
2 Jane Doe 14 9
4 Jack Daniels 15 9
1 John Doe 15 10
3 Jim Beam 16 10

SELECT DISTINCT StudentID Get distinct of Selected column(s)

SELECT DISTINCT StudentID
FROM Students_test;
StudentID
2
4
1
3

13. SQL Where: Filtering rows

The SQL WHERE clause is the filtering mechanism in SQL. It specifies a condition while fetching data from a single table or by joining with multiple tables. If the given condition is satisfied, only then it returns a specific value from the table.

You could say the WHERE clause is the gatekeeper of data – only letting through the data you specifically request.

Select *
FROM Students_test
WHERE Grade < 10;
StudentID FirstName LastName Age Grade
2 Jane Doe 14 9
4 Jack Daniels 15 9
2 Jane Doe 14 9
4 Jack Daniels 15 9

Multiple conditions

Select *
FROM Students_test
WHERE Age > 14 AND Grade < 10;
StudentID FirstName LastName Age Grade
4 Jack Daniels 15 9
4 Jack Daniels 15 9

Select specific STUDENTIDs

Select *
FROM Students_test
WHERE StudentID in (2,3,4);
StudentID FirstName LastName Age Grade
2 Jane Doe 14 9
4 Jack Daniels 15 9
2 Jane Doe 14 9
3 Jim Beam 16 10
4 Jack Daniels 15 9

13b. Exercise 3: Filtering Data

Try the following in MySQL Workbench on the Sakila Database

  1. From SAKILA.FILM, select only the film titles and description with rental rate more than 2.00 and Rating is ‘PG’.

  2. From SAKILA.PAYMENTS, count how many payments were make more than $5.

Note:

  • The Sakila db is present as a default database in MySQL workbench.
    If not present, follow the instructions in the ‘Setup MySQL WorkBench’ lesson above to set it up.

  • Basically, you need to run the 2 SQL scripts inside ‘Sakila-db’ directory: sakila-schema.sql, then, sakila-data.sql.

14. SQL Order By: Sort the data in ascending or descending order

The SQL ORDER BY keyword is used to sort the result-set in ascending or descending order. The ORDER BY keyword sorts the records in ascending order by default. If you want to sort the records in descending order, you can use the DESC keyword.

SELECT * 
FROM Students_test 
ORDER BY Grade ASC;
StudentID FirstName LastName Age Grade
2 Jane Doe 14 9
4 Jack Daniels 15 9
2 Jane Doe 14 9
4 Jack Daniels 15 9
1 John Doe 15 10
3 Jim Beam 16 10

ORDER BY Grade ASC, Age DESC; Sort / Order using multiple variables and diffrent order types (ASC, DESC)

SELECT * 
FROM Students_test 
ORDER BY Grade ASC, Age DESC;
StudentID FirstName LastName Age Grade
4 Jack Daniels 15 9
4 Jack Daniels 15 9
2 Jane Doe 14 9
2 Jane Doe 14 9
3 Jim Beam 16 10
1 John Doe 15 10

15. SQL Group By : Aggregate identical data into groups

‘GROUP BY’ is an SQL command that arranges identical data into groups.

‘GROUP BY’ goes hand in hand with aggregate functions like COUNT, AVG, SUM, MAX, or MIN to provide meaningful insights from the data. It enables you to perform calculations on each group of rows rather than individual rows.

SELECT StudentID,
       FirstName, 
       AVG(Age) as AvgAge,
       SUM(Grade) as SumGrade
FROM Students_test
GROUP BY StudentID, FirstName;
StudentID FirstName AvgAge SumGrade
1 John 15.0 10
2 Jane 14.0 18
3 Jim 16.0 10
4 Jack 15.0 18

16. SQL Having: Filter rows after aggregating

Consider the following SQL statement.

SELECT StudentID,
       FirstName, 
       AVG(Age) as Age,
       SUM(Grade) as Grade
FROM Students_test
WHERE Grade <= 10
GROUP BY StudentID, FirstName;

Now consider this with the ‘HAVING’ clause

SELECT StudentID,
       FirstName, 
       AVG(Age) as Age,
       SUM(Grade) as SGrade
FROM Students_test
GROUP BY StudentID, FirstName
HAVING SGrade <= 100;
StudentID FirstName Age SGrade
1 John 15.0 10
2 Jane 14.0 18
3 Jim 16.0 10
4 Jack 15.0 18

What is the difference between “WHERE” and “HAVING” in SQL?

This is a very important interview question.

Answer:

The HAVING clause is used in SQL to filter the results of a GROUP BY clause.

While the WHERE clause can filter rows before they are aggregated, the HAVING clause filters the rows after they’ve been grouped and aggregated. This is the key distinction between the WHERE and HAVING clauses in SQL.

SELECT StudentID,
       FirstName, 
       AVG(Age) as Age,
       SUM(Grade) as SGrade
FROM Students_test
GROUP BY StudentID, FirstName
HAVING SGrade <= 100;
StudentID FirstName Age SGrade
1 John 15.0 10
2 Jane 14.0 18
3 Jim 16.0 10
4 Jack 15.0 18

The column name in the HAVING clause refers to the name the comes after the grouping is done.

17. Joining Datasets

Let’s say, you have two tables: Orders and Customer info. Both tables have the customerid in common. And you want to add Customer info to the orders data so you can see the details of each customer against every order.

How do you achieve this?

SQL Joins are used to combine rows from two or more tables based on a common column between them.

This common column is often a primary key in one table that corresponds to a foreign key in another.

Types of SQL Joins

There are four main types of SQL Joins

17(a) – INNER JOIN: Returns records that have matching values in both tables.

17(b) – FULL (OUTER) JOIN: Returns all records when there is a match in either the left or the right table.

17(c) – LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.

17(d) – RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.

Let’s illustrate these concepts using a simple database with two tables, Customers and Orders.

Let’s illustrate these concepts using a simple database with two tables, Customers and Orders.

Customer Table:

CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(255)
);

INSERT INTO Customers (CustomerID, Name)
VALUES (1, 'John'),
       (2, 'Peter'),
       (3, 'Mary'),
       (4, 'Sally');

View Customers table

select * from Customers;
CustomerID Name
1 John
2 Peter
3 Mary
4 Sally

Orders Table:

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    Product VARCHAR(255)
);

INSERT INTO Orders (OrderID, CustomerID, Product)
VALUES (1, 1, 'Apples'),
       (2, 2, 'Bananas'),
       (3, 4, 'Grapes'),
       (4, 5, 'Oranges');

View Orders table

select * from Orders;
OrderID CustomerID Product
1 1 Apples
2 2 Bananas
3 4 Grapes
4 5 Oranges

17(a) – INNER JOIN

The INNER JOIN keyword selects records that have matching values in both tables.

SQL Statement:

SELECT Orders.*, Customers.Name 
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
OrderID CustomerID Product Name
1 1 Apples John
2 2 Bananas Peter
3 4 Grapes Sally

17(b) – FULL JOIN

The FULL JOIN keyword returns all records when there is a match in the left (Customers) or the right (Orders) table. Records that do not have a match in the other table are shown as NULL.

SQL Statement:

SELECT Customers.Name, Orders.Product
FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
Name Product
John Apples
Peter Bananas
Mary None
Sally Grapes
None Oranges
Name Product
John Apples
Peter Bananas
Mary NULL
Sally Grapes
NULL Oranges
%%html
<style>
table {float:left}
</style>

17(c) – LEFT JOIN

The LEFT JOIN keyword returns all records from the left table (Customers), and the matched records from the right table (Orders). The result is NULL from the right side if there is no match.

SQL Statement:

SELECT Customers.Name, Orders.Product
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
Name Product
John Apples
Peter Bananas
Mary None
Sally Grapes

17(d) – RIGHT JOIN

The RIGHT JOIN keyword returns all records from the right table (Orders), and the matched records from the left table (Customers). The result is NULL from the left side when there is no match.

SQL Statement:

SELECT Customers.Name, Orders.Product
FROM Customers
RIGHT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
Name Product
John Apples
Peter Bananas
Sally Grapes
None Oranges
Name Product
John Apples
Peter Bananas
Sally Grapes
NULL Oranges

18. Exercise 4: Joining Exercise

Try the following in MySQL Workbench on the Sakila Database

  1. From Sakila Database, create a new table that contains:
    • Film name
    • Corresponding Actor name
    • Release Year
    • Rating
    • Rental_Rate

Make sure none of the films get missed out in the result.

Hint: Use Film table and Film Actor Table.

  1. What is the average rental rate for each Film actor?

Note:

  • The Sakila db is present as a default database in MySQL workbench.
    If not present, follow the instructions in the ‘Setup MySQL WorkBench’ lesson above to set it up.

  • Basically, you need to run the 2 SQL scripts inside ‘Sakila-db’ directory: sakila-schema.sql, then, sakila-data.sql.

19. SQL DateTime Operations

SQL provides a set of functions for working with dates and times. These functions let you manipulate and format date and time values in your SQL queries.

1) Extract the Date and time from a DateTime
2) Extract Year, Month, and Day
3) Extract Hours, Minutes, and Seconds
4) Increment and decrement Date Column by Days, Months and Years
5) Current date, time, and timestamp
6) Calculate Date Difference in Days, Years and Months

CREATE TABLE Employees (
    ID INT PRIMARY KEY,
    Name VARCHAR(100),
    HireDate DATETIME
);

INSERT INTO Employees (ID, Name, HireDate)
VALUES 
(1, 'John Doe', '2017-03-12 09:30:10'),
(2, 'Jane Smith', '2018-05-14 15:20:08'),
(3, 'Emily Johnson', '2019-07-30 13:15:15'),
(4, 'Robert Brown', '2020-10-20 10:00:35');

Select * from Employees;
ID Name HireDate
1 John Doe 2017-03-12 09:30:10
2 Jane Smith 2018-05-14 15:20:08
3 Emily Johnson 2019-07-30 13:15:15
4 Robert Brown 2020-10-20 10:00:35

1) Extract the Date and time from a DateTime

You can extract the date part from a DateTime using the DATE() function.

SELECT *, 
      DATE(HireDate) AS 'Hire Date',
      TIME(HireDate) AS 'Hire Time'
FROM Employees;
ID Name HireDate Hire Date Hire Time
1 John Doe 2017-03-12 09:30:10 2017-03-12 09:30:10
2 Jane Smith 2018-05-14 15:20:08 2018-05-14 15:20:08
3 Emily Johnson 2019-07-30 13:15:15 2019-07-30 13:15:15
4 Robert Brown 2020-10-20 10:00:35 2020-10-20 10:00:35

2. Extract Year, Month, and Day

You can use the strftime('%Y', date_col), strftime('%m', date_col), and strftime('%d', date_col) functions to extract the respective date components from a DateTime value.

SELECT *, 
    strftime('%Y', HireDate) as 'HireYear',
    strftime('%m', HireDate) as 'HireMonth',
    strftime('%d', HireDate) as 'HireDay'
FROM Employees;
ID Name HireDate HireYear HireMonth HireDay
1 John Doe 2017-03-12 09:30:10 2017 03 12
2 Jane Smith 2018-05-14 15:20:08 2018 05 14
3 Emily Johnson 2019-07-30 13:15:15 2019 07 30
4 Robert Brown 2020-10-20 10:00:35 2020 10 20

3. Extract Hours, Minutes, and Seconds

You can use the strftime('%H', date_col), strftime('%M', date_col), and strftime('%S', date_col) functions to extract the respective date components from a DateTime value.

SELECT *, 
    strftime('%H', HireDate) as 'HireHour',
    strftime('%M', HireDate) as 'HireMinute',
    strftime('%S', HireDate) as 'HireSecond'
FROM Employees;
ID Name HireDate HireHour HireMinute HireSecond
1 John Doe 2017-03-12 09:30:10 09 30 10
2 Jane Smith 2018-05-14 15:20:08 15 20 08
3 Emily Johnson 2019-07-30 13:15:15 13 15 15
4 Robert Brown 2020-10-20 10:00:35 10 00 35

4. Increment and decrement Date Column by Days, Months and Years

SELECT *,
    DATETIME(HireDate, '+10 days') AS Add_Days,
    DATETIME(HireDate, '-1 months') AS Add_Months,
    DATETIME(HireDate, '+1 years') AS Add_Years

FROM Employees;
ID Name HireDate Add_Days Add_Months Add_Years
1 John Doe 2017-03-12 09:30:10 2017-03-22 09:30:10 2017-02-12 09:30:10 2018-03-12 09:30:10
2 Jane Smith 2018-05-14 15:20:08 2018-05-24 15:20:08 2018-04-14 15:20:08 2019-05-14 15:20:08
3 Emily Johnson 2019-07-30 13:15:15 2019-08-09 13:15:15 2019-06-30 13:15:15 2020-07-30 13:15:15
4 Robert Brown 2020-10-20 10:00:35 2020-10-30 10:00:35 2020-09-20 10:00:35 2021-10-20 10:00:35

5. Current date, time, and timestamp: CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions will return the current date, time, and timestamp respectively.

SELECT CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP;
CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP
2023-08-20 07:27:36 2023-08-20 07:27:36

6. Calculate Date Difference in Days, Years and Months

SELECT *,
round(julianday('now') - julianday(HireDate),0) As Date_Diff_Days,
round((julianday('now') - julianday(HireDate))/30.44,0) As Date_Diff_Months,
round((julianday('now') - julianday(HireDate))/365.25,0) As Date_Diff_Years
FROM Employees;
ID Name HireDate Date_Diff_Days Date_Diff_Months Date_Diff_Years
1 John Doe 2017-03-12 09:30:10 2352.0 77.0 6.0
2 Jane Smith 2018-05-14 15:20:08 1924.0 63.0 5.0
3 Emily Johnson 2019-07-30 13:15:15 1482.0 49.0 4.0
4 Robert Brown 2020-10-20 10:00:35 1034.0 34.0 3.0

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science