SQL Tutorial – A Simple and Intuitive Guide to the Structured Query Language

SQL, short for Structured Query Language is a programming language used to communicate with databases and do various types of Data wrangling operations. This is an essential skillset for any type of Data related jobs. In this tutorial, let’s get started with the basics of SQL.

1. Why you should learn SQL?

First, let’s understand what is SQL.

SQL, pronounced “see-quel” or “S-Q-L”, is a programming language specifically designed for managing databases.

SQL is used to communicate with databases to retrieve and manipulate data.

Application or websites use databases to store and access data, like user information, transaction data, product details, etc. So, if you have to create such applications, you will need to know at least basic SQL.

It is particularly effective for handling structured data, i.e., data incorporating relations among entities and variables.

The History of SQL

SQL was first developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s.

It was initially called SEQUEL (Structured English Query Language) and was designed to manipulate and retrieve data stored in IBM’s original quasi-relational database management system, System R, which a group at IBM San Jose Research Laboratory had developed.

Over time, SQL became a standard tool for managing data, and it has been maintained and standardized by the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) since 1986.

It is still effective and in-practice to this day, even modern databases adopting SQL at it’s core.

Why Learn SQL?

It is the most popular, largely accepted way to query data from large databases and do all sorts of data wrangling. It is a basic knowledge requirement expected out of Data professionals.

SQL and relational databases are everywhere!

Helpful in pursuing a Data Science Career

Almost every data related technology supports SQL in some form

Handle large amounts of structured data

Easy to learn. Get good with practice.

Learning SQL (Structured Query Language) can be advantageous for a number of reasons:

1) Data Management: SQL is a standard language for managing data held in a relational database management system (RDBMS) or for stream processing in a relational data stream management system (RDSMS).

2) In Demand Skill: Makes you more marketable for jobs. From data analysts and data scientists to software engineers and database administrators, many jobs require SQL skills.

3) Data Analysis: SQL is designed not just to manage data, but to retrieve and analyze it. You can use SQL to perform complex queries that help you understand data trends, patterns, and insights.

4) Interoperability: Almost every RDBMS (like MySQL, Oracle, SQL Server, SQLite, Postgres) uses SQL, even with some differences in dialect. This means that learning SQL gives you skills that you can use across various database systems.

5) Integrations with Other Languages: SQL integrates with other programming languages such as Python, Java, and R.

6) Handling Large Databases: SQL can handle databases of all sizes, from a few hundred entries to millions or even billions. This makes it a valuable for companies dealing with big data.

7) Long-term Relevance: SQL has been around since the 1970s and is still going strong. The importance of SQL in data management ensures its relevance for many years to come.

8) Ease of Learning: While SQL has depth and complexity. It has a short learning curve but more you practice the better you get at it.

SQL is a powerful and robust tool for extracting relevant and useful data from a large dataset.

While SQL has traditionally been the specialty of highly-trained data analysts and programmers, it’s finding greater acceptance among non-technical folks.

2. Types of SQL servers

SQL servers generally refer to a category of software that provides relational database management system (RDBMS) services, which are often used in conjunction with the SQL (Structured Query Language) programming language.

Here are several types of SQL servers provided by different software vendors:

Microsoft SQL Server: A widely used commercial SQL server from Microsoft.
MySQL: An open-source RDBMS popular for web applications, now owned by Oracle.
PostgreSQL: An open-source object-relational database system known for its robustness and complexity.
Oracle Database: A comprehensive and powerful commercial RDBMS from Oracle Corporation.
MariaDB: An open-source RDBMS, and a community-developed fork of MySQL.
IBM DB2: An enterprise-level, high-performance DBMS from IBM.
Amazon RDS: A scalable and managed RDBMS service from Amazon Web Services.
Google Cloud SQL: A fully-managed RDBMS service offered by Google Cloud.
Azure SQL Database: A fully managed RDBMS service provided by Microsoft’s Azure Cloud platform.
SQLite: A lightweight, disk-based database with an embedded processing model.SQLite is a C library that provides a lightweight disk-based database. It doesn’t have a separate server process like most other SQL servers.
Apache HIVE: Manage petabytes of data residing in distributed storage using SQL.

The sqlite3 module in Python provides an SQL interface for interacting with SQLite databases. This module is an embedded SQL database engine and does not have a separate server process.

4. What is a Database and Table?

Before you start learning SQL, it is very important to understand what is Database and Table?

1) Database: Imagine a library. It’s a place where lots of information is stored. This information is organized in a way that you can find and access what you’re looking for.

A database is the equivalent of this library in the digital world. It’s a structured set of data. So, a database might be used to store information for a website, a business, a game, and so on. It helps us store, manage, and retrieve that information efficiently.

2) Tables: Now within this library, books are organized into different sections or shelves based on categories like ‘Science Fiction’, ‘History’, ‘Biography’ and so forth. Tables are just like these sections in our library analogy.

In a database, a table is a set of data elements (values) that is organized using a model of vertical columns (which are identified by their name) and horizontal rows, the cell being the unit where a row and column intersect.

A table has a specified number of columns, but can have any number of rows.

SQL Syntax example

Typically we write SQL Commands in CAPS. However, SQL is NOT case sensitive. So, SELECT is same as select in SQL.

SELECT ProductID, CustomerID, COUNT(*)
FROM Orders
WHERE Quantity > 0
GROUP BY ProductID, CustomerID
HAVING COUNT(*) > 5
ORDER BY ProductID ASC, CustomerID DESC;

Explanation:

SELECT: Retrieve specific columns or expressions from the table.
FROM: Specify the table from which to retrieve data.
WHERE: Filter the data based on specific conditions.
GROUP BY: Group the result set based on one or more columns.
HAVING: Filter the grouped data based on conditions.
ORDER BY: Sort the result set based on one or more columns.

5. Setup Python Environment for SQL

Install Anaconda
Or Use Google Colab

Check instuctions in respective video lessons in Python Course

6. Import necessary libraries and create sample data

Here is the plan:

We are going to use pandas to load a CSV file as a Pandas Dataframe. Then we will load this dataframe to SQL Database. From then on, we will do all the work / data processing in SQL.

Note:

Typically, in workplace scenarios, this step is done for you and not required. Your org will typically already have the databases with tables. Nevertheless, let’s see how to create one if not already given so you get the full picture.

#!pip install pandas
import pandas as pd
sample_df = pd.read_csv('DataSQL.csv')
sample_df.head()

	Current_id	Dep_Var	Logit_Var	Var_1	Var_2	Var_3	Var_4	Var_5
0	301897853	16428	0	-113	40000	436	105	13
1	301934672	8678	1	-120	35000	446	113	15
2	301902540	25561	1	-99	36000	456	132	14
3	301864713	8528	0	-85	21600	408	142	14
4	301937240	10128	0	-118	38000	396	139	13

Basic outline to follow while learning SQL in a structured manner

Create a Database
Create and Load Table in Database
Create New Table from existing table
Copy data from one table and insert it into already exisitng table
Get list of tables in SQL DB
Drop Tables in DB
Delete rows in a existing Dataset using where condition
Update rows/columns in a existing Dataset
Select data / required vaiables from a dataset
SQL Comments
Limit the number of rows returned from a query
Identify and Delete duplicates in a dataset
Filter rows in SQL dataset
Sort the data in ascending or descending order
Aggregate identical data into groups
Filter rows after the aggregated
SQL Joins
- 17(a) – Inner Join
- 17(b) – Full Join
- 17(c) – Left Join
- 17(d) – RIght Join

1. Create DB and Run SQL in Jupyter Environment

Database in SQL is used to manage and manipulate the data.

In this Database, data is organized into tables which are related to each other based on common keys or concepts; hence the term “relational database”.

Each table consists of columns (fields) and rows (records). A unique key identifies each record in a table, and relationships between tables are created through these keys.

Python ships with a default SQL database called ‘sqlite’. Let’s connect to the db that comes with this course called jupyter_sql_tutorial.db.

STEP 1: CONNECT or CREATE

# !pip install db-sqlite3
import sqlite3

# Use sqlite3 library to create a database connection
Test_db = sqlite3.connect('jupyter_sql_tutorial.db')

If the database does not exist, it will create a new one.

# Create new database if not exist
Test_db2 = sqlite3.connect('new_database.db')

If you check the work directory, it should contain a new file named new_database.db.

Important: Install ipython-sql for the %sql function to work

# Common error: https://stackoverflow.com/questions/37149748/ipython-notebook-and-sql-importerror-no-module-named-sql-when-running-load
# !pip install ipython-sql

STEP 2: LOAD DATABASE and ATTACH

In database management systems, the “ATTACH DATABASE” statement is used to connect or link an external database to the current database session.

%load_ext sql
%sql sqlite:///jupyter_sql_tutorial.db

Important note, if you want to run this code in Jupyter notebook:

Add ‘%%sql’ in the beginning of each cell so Jupyter knows the cell contains SQL code.
Do ‘Attach Database’ so we can refer ‘jupyter_sql_tutorial.db’ as ‘test_db’.

ATTACH DATABASE 'jupyter_sql_tutorial.db' AS Test_db;

STEP 3: Save Pandas DataFrame to SQL Database (Test_db)

# Load the dataset if not already loaded.
try:
    sample_df.to_sql('sample_df', Test_db)
except:
    pass

Note: If above statement gives error that the table sample_df already exists, then it means the table has already been loaded, so no need to re-run.

View / Print sample_df Loaded into Test_db

Now, let’s read the data from the Sample_df table stored in Test_db

Notice the syntax.

SELECT *
FROM Test_db.Sample_df

This prints the entire table. Not showing here because of long output.

The above output was too long, and can easily run out of memory for large tables.

So, let’s get only 5 rows.

SELECT *
FROM Test_db.Sample_df
LIMIT 5;

index	Current_id	Dep_Var	Logit_Var	Var_1	Var_2	Var_3	Var_4	Var_5
0	301897853	16428	0	-113	40000	436	105	13
1	301934672	8678	1	-120	35000	446	113	15
2	301902540	25561	1	-99	36000	456	132	14
3	301864713	8528	0	-85	21600	408	142	14
4	301937240	10128	0	-118	38000	396	139	13

How to count the number of records in the table?

SELECT count(*)
FROM Test_db.Sample_df;

14989

How to count the columns?

That is not the right way to think about it. Let me explain.

In Python, you normally get the count of columns to reference the column by column index or do some iterations.

Whereas in SQL, we reference the columns by the respective column names. All the operations we do is vectorised (applied on the entire data).

Nevertheless, SQL does not prevent you from finding the count of columns, but how you compute it depends on what RDBMS you work on.

Here are couple of ways to do it: MySQL, SQLite

SELECT COUNT(*)
FROM pragma_table_info('Sample_df');

COUNT(*)
9

2. Data wrangling in SQL vs Python

A key difference in Data Wrangling between SQL and Python is: In Python, you can create your own functions, iterate in a customized way by writing For Loops. But in SQL, we accomplish all tasks by the limited set of predefined SQL commands.

3. Setup MySQL Workbench

Helpful Article:
1. How to install MySQL on Windows
2. How to install MySQL on MacOS
3. Official Guide: Windows, MacOS

With time, these article links may change. To get the latest just google search “Install MySQL in Windows” (or Mac) and look into the latest article.

Steps

Download MYSQL installer: https://dev.mysql.com/downloads/installer/
Once installation completes select the ‘Full’ setup option.

In ‘Type and Networking’ screen, keep it as ‘Development Computer’ if you are using a personal PC.
Keep the default port or you can give your own.

Continue default method until it asks you to give root account password.

Apply configuration and Finish.

Done! The next step is to connect to the server. Type the root account’s password and click Check

In MySQL Router configuration, uncheck this box and click next.

Once everything is set, you will be greeted with the workbench screen. Choose the connection to the server you had created and log into it.

You can now start writing your queries.

Information on Sakila Database we use for Exercises

The sakila database is very popular, basic set of tables that comes preinstalled by default when you install MySQL server in the workbench. If it’s not present in anycase, you can find out more about the database and set it up using the following links:

Structure of DB: https://dev.mysql.com/doc/sakila/en/sakila-structure.html
Installation: https://dev.mysql.com/doc/sakila/en/sakila-installation.html
Download link: https://dev.mysql.com/doc/index-other.html
Alternate Download Link: https://www.sqliz.com/sakila/installation/

2. Create Table manually in DB

Tables are an essential component of databases as they hold all the data. When a table is created, the name of the table, names of columns, and the type of data that can be stored in the columns are defined.

The CREATE TABLE statement in SQL is used to create a new table in a database with “0 Rows”.

CREATE TABLE Students (
  StudentID int PRIMARY KEY,
  FirstName varchar(255),
  LastName varchar(255),
  Age int,
  Grade int
);

SELECT * FROM Students

StudentID	FirstName	LastName	Age	Grade

Add rows manually to Students table

INSERT INTO Students (StudentID, FirstName, LastName, Age, Grade)
VALUES (1, 'John', 'Doe', 15, 10),
       (2, 'Jane', 'Doe', 14, 9),
       (3, 'Jim', 'Beam', 16, 10),
       (4, 'Jack', 'Daniels', 15, 9);

SELECT * FROM Students

StudentID	FirstName	LastName	Age	Grade
1	John	Doe	15	10
2	Jane	Doe	14	9
3	Jim	Beam	16	10
4	Jack	Daniels	15	9

Get Table Variables types

PRAGMA table_info(Students);

cid	name	type	dflt_value	pk
0	StudentID	INT	None	1
1	FirstName	varchar(255)	None	0
2	LastName	varchar(255)	None	0
3	Age	INT	None	0
4	Grade	INT	None	0

PRAGMA is a sqlite specific command. If you are working on other databases, say MySQL, use DESCRIBE command.

DESCRIBE Students

SQL Variables Types:

Creating a table in SQL requires defining not only the column names but also the data types for each column. The most common types of data used in SQL are:

A) INT: For integer numbers

B) FLOAT or DOUBLE: For floating point numbers

C) DATE: For date values

D) TIMESTAMP: For date and time values

E) CHAR(size): For fixed length strings (size = number of characters)

F) VARCHAR(size): For variable length strings (size = maximum number of characters)

G) TEXT: For storing large amount of text

I) BIT: For storing either true or false logical values.

J) BLOB

Here is a sample SQL command to create a table with various types of variables:

CREATE TABLE Employee (
    EmployeeID INT PRIMARY KEY,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    BirthDate DATE,
    HireDate TIMESTAMP,
    Salary DOUBLE,
    Bio TEXT,
    Photo BLOB,
    IsActive BIT
);

2b. Exercise: Create Orders table from raw data.

Try the following in Jupyter or MySQL Workbench.

Create a database ‘homework’.
Create an orders table in ‘homework’ containing the following fields:
- OrderID
- OrderDate
- Qty
- Customer ID
- Product category
- Price
Then insert the following entries.
- 1, 2023-01-01, 2, 10, “Electronics”, 110
- 2, 2023-01-01, 2, 11, “Home”, 10
- 3, 2023-01-01, 2, 11, “Kitchen”, 80
- 4, 2023-01-01, 2, 12, “Electronics”, 11
- 5, 2023-01-01, 2, 12, “Outdoor”, 19
- 6, 2023-01-01, 2, 13, “Outdoor”, 217
- 7, 2023-01-01, 2, 13, “Home”, 8

3. Create New Table from existing table

The “CREATE TABLE” statement copies data from one table and inserts it into a new table.

CREATE TABLE Students_test AS 
SELECT * FROM Students
WHERE Grade < 10;

SELECT * FROM Students_test

StudentID	FirstName	LastName	Age	Grade
2	Jane	Doe	14	9
4	Jack	Daniels	15	9

4. Show Tables from existing Database

Different RDBMS (like SQLite, MySQL.. ) have different ways of doing this. The two most common methods are shown below.

For SQLite, this info is stored in an in-built database called sqlite_master. You can query that table using the following command.

SELECT name FROM sqlite_master WHERE type='table';

name
sample_df
Students
Employee
Students_test

In MYSQL databases, you can use the SHOW TABLES command. This is very common in most RDBMS.

SHOW TABLES FROM {DB_NAME};

Ref: https://dev.mysql.com/doc/refman/8.0/en/show-tables.html

5. Insert Into Select: copy data from one table to another table

Insert Into Select is a versatile feature that can be utilized to copy data from one table and insert it into already exisitng table.

It will not copy to a new table.

INSERT INTO Students_test
SELECT *
FROM Students;

SELECT * FROM Students_test

StudentID	FirstName	LastName	Age	Grade
2	Jane	Doe	14	9
4	Jack	Daniels	15	9
1	John	Doe	15	10
2	Jane	Doe	14	9
3	Jim	Beam	16	10
4	Jack	Daniels	15	9

Create one more

INSERT INTO Students_test2
SELECT *
FROM Students;

SELECT * FROM Students_test2

StudentID	FirstName	LastName	Age	Grade
1	John	Doe	15	10
2	Jane	Doe	14	9
3	Jim	Beam	16	10
4	Jack	Daniels	15	9

This did not work.

Why?

Because Student_test2 table did not exist. So, you need to first create it.

CREATE TABLE Students_test2 AS 
SELECT * FROM Students
WHERE Grade < 10;

SELECT * FROM Students_test2

 * sqlite:///jupyter_sql_tutorial.db
(sqlite3.OperationalError) table Students_test2 already exists
[SQL: CREATE TABLE Students_test2 AS 
SELECT * FROM Students
WHERE Grade < 10;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

Check if new table was created

SELECT name 
FROM sqlite_master 
WHERE type='table';

name
sample_df
Students
Employee
Students_test
Students_test2

6. SQL DELETE: Delete rows in a existing Dataset using where condition

DELETE command can be used to delete a single record, multiple records, or all records from a table.

DELETE FROM Students_test2
WHERE Grade = 9;

SELECT * FROM Students_test2

StudentID	FirstName	LastName	Age	Grade
1	John	Doe	15	10
3	Jim	Beam	16	10

You can give more complex conditions as well.

DELETE FROM Students_test2
WHERE Grade = 9 AND AGE = 15;

SELECT * FROM Students_test2

StudentID	FirstName	LastName	Age	Grade
1	John	Doe	15	10
3	Jim	Beam	16	10

7. Delete a Table

You can delete / drop an entire table in a single DROP TABLE command. So, be absolutely sure before running this.

The DROP TABLE command in SQL is a Data Definition Language (DDL) statement used to remove a table definition along with all the data, indexes, triggers, constraints, and permission specifications for that table. Essentially, the command completely eliminates the table from the database.

It is important to exercise caution when using this command because once a table is dropped, all the information in the table is lost and cannot be recovered (unless you have a backup).

DROP TABLE Students_test2;

Check the DROP TABLE Command after dropping table (Students_test2) in Database

It will give an error, because, it does not exist anymore.

DROP TABLE Students_test2;

 * sqlite:///jupyter_sql_tutorial.db
(sqlite3.OperationalError) no such table: Students_test2
[SQL: DROP TABLE Students_test2;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

Check tables in the Database

SELECT name 
FROM sqlite_master 
WHERE type='table';

name
sample_df
Students
Employee
Students_test

8. SQL Update : Update rows/columns in a existing Dataset

The UPDATE statement is used to modify existing records in a table

UPDATE Students
SET Grade = Grade * 10, 
    Age = Age + 2
WHERE FirstName = 'Jane';

SELECT * FROM Students

StudentID	FirstName	LastName	Age	Grade
1	John	Doe	15	10
2	Jane	Doe	16	90
3	Jim	Beam	16	10
4	Jack	Daniels	15	9

8b. Exercise 2: Updating Table

Try the following in MySQL Workbench on the Sakila Database

Set to use the database as Sakila.
Update Sakila.FILM to increase all values ‘Rental_Duration’ by 5

Note:

The Sakila db is present as a default database in MySQL workbench.
If not present, follow the instructions in the ‘Setup MySQL WorkBench’ lesson above to set it up.
Basically, you need to run the 2 SQL scripts inside ‘Sakila-db’ directory: sakila-schema.sql, then, sakila-data.sql.

9. SQL Select : Select data / required vaiables from a dataset

The SELECT statement is used to select data from a database.

Select * is used to select all colums in the table

SELECT *
FROM Students;

StudentID	FirstName	LastName	Age	Grade
1	John	Doe	15	10
2	Jane	Doe	16	90
3	Jim	Beam	16	10
4	Jack	Daniels	15	9

Select only defined columns SELECT FirstName, LastName

SELECT FirstName, LastName
FROM Students;

FirstName	LastName
John	Doe
Jane	Doe
Jim	Beam
Jack	Daniels

Select Only ‘Jane’ and ‘Jack’

SELECT FirstName, LastName
FROM Students
WHERE FirstName in ("Jane", "Jack");

FirstName	LastName
Jane	Doe
Jack	Daniels

Select all except ‘Jane’ and ‘Jack’

SELECT FirstName, LastName
FROM Students
WHERE FirstName not in ("Jane", "Jack");

FirstName	LastName
John	Doe
Jim	Beam

Select all where LastName startd with “D”

SELECT FirstName, LastName
FROM Students
WHERE LastName LIKE ("D%");

FirstName	LastName
John	Doe
Jane	Doe
Jack	Daniels

More on such patterns in an upcoming section.

10. SQL Comments

Comments are lines of text in your SQL code that the SQL interpreter or compiler ignores

SQL offers two types of comment syntax:

Single-Line Comments
Multi-Line Comments

-- This is a single-line comment

/*
  This is a multi-line comment.
  The SELECT statement below retrieves all records from the Customers table.
*/

SELECT * FROM Students;

StudentID	FirstName	LastName	Age	Grade
1	John	Doe	15	10
2	Jane	Doe	16	90
3	Jim	Beam	16	10
4	Jack	Daniels	15	9

11. Limit the number of rows returned from a query

The SELECT TOP clause is a SQL command used to limit the number of rows returned from a query. This is particularly useful when you’re dealing with large databases, and you want a subset of records.

Fetching Top N Records

If you want to select the top 3 orders

The LIMIT clause is used to limit the number of records returned by the query. For example, if you set LIMIT 3, it will return only the first 3 records from the result set.

First, let’s print all rows

SELECT * 
FROM Students_test;

Only 3 rows

SELECT * 
FROM Students_test LIMIT 3;

StudentID	FirstName	LastName	Age	Grade
2	Jane	Doe	14	9
4	Jack	Daniels	15	9
1	John	Doe	15	10

The OFFSET clause is used to specify the starting point for the rows to be returned. If you set OFFSET 2, it means that the first 2 rows will be skipped and the query will start returning records from the 3rd row.

So, when you combine LIMIT 3 OFFSET 2 in a SQL query, it means that the SQL statement will skip the first 2 records and then return the next 3 records.

SELECT * 
FROM Students_test 
LIMIT 3 OFFSET 2;

StudentID	FirstName	LastName	Age	Grade
1	John	Doe	15	10
2	Jane	Doe	14	9
3	Jim	Beam	16	10

12. SELECT DISTINCT : Identify and Delete duplicates in a dataset

The primary function of the SELECT DISTINCT statement is to eliminate all duplicate rows and present a table of unique rows.

Sample data for understanding SELECT DISTINCT

SELECT * 
FROM Students_test

StudentID	FirstName	LastName	Age	Grade
2	Jane	Doe	14	9
4	Jack	Daniels	15	9
1	John	Doe	15	10
2	Jane	Doe	14	9
3	Jim	Beam	16	10
4	Jack	Daniels	15	9

SELECT DISTINCT * Get distinct of entire row

SELECT DISTINCT *
FROM Students_test;

StudentID	FirstName	LastName	Age	Grade
2	Jane	Doe	14	9
4	Jack	Daniels	15	9
1	John	Doe	15	10
3	Jim	Beam	16	10

SELECT DISTINCT StudentID Get distinct of Selected column(s)

SELECT DISTINCT StudentID
FROM Students_test;

StudentID
2
4
1
3

13. SQL Where: Filtering rows

The SQL WHERE clause is the filtering mechanism in SQL. It specifies a condition while fetching data from a single table or by joining with multiple tables. If the given condition is satisfied, only then it returns a specific value from the table.

You could say the WHERE clause is the gatekeeper of data – only letting through the data you specifically request.

Select *
FROM Students_test
WHERE Grade < 10;

StudentID	FirstName	LastName	Age	Grade
2	Jane	Doe	14	9
4	Jack	Daniels	15	9
2	Jane	Doe	14	9
4	Jack	Daniels	15	9

Multiple conditions

Select *
FROM Students_test
WHERE Age > 14 AND Grade < 10;

StudentID	FirstName	LastName	Age	Grade
4	Jack	Daniels	15	9
4	Jack	Daniels	15	9

Select specific STUDENTIDs

Select *
FROM Students_test
WHERE StudentID in (2,3,4);

StudentID	FirstName	LastName	Age	Grade
2	Jane	Doe	14	9
4	Jack	Daniels	15	9
2	Jane	Doe	14	9
3	Jim	Beam	16	10
4	Jack	Daniels	15	9

13b. Exercise 3: Filtering Data

Try the following in MySQL Workbench on the Sakila Database

From SAKILA.FILM, select only the film titles and description with rental rate more than 2.00 and Rating is ‘PG’.
From SAKILA.PAYMENTS, count how many payments were make more than $5.

Note:

The Sakila db is present as a default database in MySQL workbench.
If not present, follow the instructions in the ‘Setup MySQL WorkBench’ lesson above to set it up.
Basically, you need to run the 2 SQL scripts inside ‘Sakila-db’ directory: sakila-schema.sql, then, sakila-data.sql.

14. SQL Order By: Sort the data in ascending or descending order

The SQL ORDER BY keyword is used to sort the result-set in ascending or descending order. The ORDER BY keyword sorts the records in ascending order by default. If you want to sort the records in descending order, you can use the DESC keyword.

SELECT * 
FROM Students_test 
ORDER BY Grade ASC;

StudentID	FirstName	LastName	Age	Grade
2	Jane	Doe	14	9
4	Jack	Daniels	15	9
2	Jane	Doe	14	9
4	Jack	Daniels	15	9
1	John	Doe	15	10
3	Jim	Beam	16	10

ORDER BY Grade ASC, Age DESC; Sort / Order using multiple variables and diffrent order types (ASC, DESC)

SELECT * 
FROM Students_test 
ORDER BY Grade ASC, Age DESC;

StudentID	FirstName	LastName	Age	Grade
4	Jack	Daniels	15	9
4	Jack	Daniels	15	9
2	Jane	Doe	14	9
2	Jane	Doe	14	9
3	Jim	Beam	16	10
1	John	Doe	15	10

15. SQL Group By : Aggregate identical data into groups

‘GROUP BY’ is an SQL command that arranges identical data into groups.

‘GROUP BY’ goes hand in hand with aggregate functions like COUNT, AVG, SUM, MAX, or MIN to provide meaningful insights from the data. It enables you to perform calculations on each group of rows rather than individual rows.

SELECT StudentID,
       FirstName, 
       AVG(Age) as AvgAge,
       SUM(Grade) as SumGrade
FROM Students_test
GROUP BY StudentID, FirstName;

StudentID	FirstName	AvgAge	SumGrade
1	John	15.0	10
2	Jane	14.0	18
3	Jim	16.0	10
4	Jack	15.0	18

16. SQL Having: Filter rows after aggregating

Consider the following SQL statement.

SELECT StudentID,
       FirstName, 
       AVG(Age) as Age,
       SUM(Grade) as Grade
FROM Students_test
WHERE Grade <= 10
GROUP BY StudentID, FirstName;

Now consider this with the ‘HAVING’ clause

SELECT StudentID,
       FirstName, 
       AVG(Age) as Age,
       SUM(Grade) as SGrade
FROM Students_test
GROUP BY StudentID, FirstName
HAVING SGrade <= 100;

StudentID	FirstName	Age	SGrade
1	John	15.0	10
2	Jane	14.0	18
3	Jim	16.0	10
4	Jack	15.0	18

What is the difference between “WHERE” and “HAVING” in SQL?

This is a very important interview question.

Answer:

The HAVING clause is used in SQL to filter the results of a GROUP BY clause.

While the WHERE clause can filter rows before they are aggregated, the HAVING clause filters the rows after they’ve been grouped and aggregated. This is the key distinction between the WHERE and HAVING clauses in SQL.

SELECT StudentID,
       FirstName, 
       AVG(Age) as Age,
       SUM(Grade) as SGrade
FROM Students_test
GROUP BY StudentID, FirstName
HAVING SGrade <= 100;

StudentID	FirstName	Age	SGrade
1	John	15.0	10
2	Jane	14.0	18
3	Jim	16.0	10
4	Jack	15.0	18

The column name in the HAVING clause refers to the name the comes after the grouping is done.

17. Joining Datasets

Let’s say, you have two tables: Orders and Customer info. Both tables have the customerid in common. And you want to add Customer info to the orders data so you can see the details of each customer against every order.

How do you achieve this?

SQL Joins are used to combine rows from two or more tables based on a common column between them.

This common column is often a primary key in one table that corresponds to a foreign key in another.

Types of SQL Joins

There are four main types of SQL Joins

17(a) – INNER JOIN: Returns records that have matching values in both tables.

17(b) – FULL (OUTER) JOIN: Returns all records when there is a match in either the left or the right table.

17(c) – LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.

17(d) – RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.

Let’s illustrate these concepts using a simple database with two tables, Customers and Orders.

Let’s illustrate these concepts using a simple database with two tables, Customers and Orders.

Customer Table:

CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(255)
);

INSERT INTO Customers (CustomerID, Name)
VALUES (1, 'John'),
       (2, 'Peter'),
       (3, 'Mary'),
       (4, 'Sally');

View Customers table

select * from Customers;

CustomerID	Name
1	John
2	Peter
3	Mary
4	Sally

Orders Table:

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    Product VARCHAR(255)
);

INSERT INTO Orders (OrderID, CustomerID, Product)
VALUES (1, 1, 'Apples'),
       (2, 2, 'Bananas'),
       (3, 4, 'Grapes'),
       (4, 5, 'Oranges');

View Orders table

select * from Orders;

OrderID	CustomerID	Product
1	1	Apples
2	2	Bananas
3	4	Grapes
4	5	Oranges

17(a) – INNER JOIN

The INNER JOIN keyword selects records that have matching values in both tables.

SQL Statement:

SELECT Orders.*, Customers.Name 
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

OrderID	CustomerID	Product	Name
1	1	Apples	John
2	2	Bananas	Peter
3	4	Grapes	Sally

17(b) – FULL JOIN

The FULL JOIN keyword returns all records when there is a match in the left (Customers) or the right (Orders) table. Records that do not have a match in the other table are shown as NULL.

SQL Statement:

SELECT Customers.Name, Orders.Product
FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

Name	Product
John	Apples
Peter	Bananas
Mary	None
Sally	Grapes
None	Oranges

Name	Product
John	Apples
Peter	Bananas
Mary	NULL
Sally	Grapes
NULL	Oranges

%%html
<style>
table {float:left}
</style>

17(c) – LEFT JOIN

The LEFT JOIN keyword returns all records from the left table (Customers), and the matched records from the right table (Orders). The result is NULL from the right side if there is no match.

SQL Statement:

SELECT Customers.Name, Orders.Product
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

Name	Product
John	Apples
Peter	Bananas
Mary	None
Sally	Grapes

17(d) – RIGHT JOIN

The RIGHT JOIN keyword returns all records from the right table (Orders), and the matched records from the left table (Customers). The result is NULL from the left side when there is no match.

SQL Statement:

SELECT Customers.Name, Orders.Product
FROM Customers
RIGHT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

Name	Product
John	Apples
Peter	Bananas
Sally	Grapes
None	Oranges

Name	Product
John	Apples
Peter	Bananas
Sally	Grapes
NULL	Oranges

18. Exercise 4: Joining Exercise

Try the following in MySQL Workbench on the Sakila Database

From Sakila Database, create a new table that contains:
- Film name
- Corresponding Actor name
- Release Year
- Rating
- Rental_Rate

Make sure none of the films get missed out in the result.

Hint: Use Film table and Film Actor Table.

What is the average rental rate for each Film actor?

Note:

The Sakila db is present as a default database in MySQL workbench.
If not present, follow the instructions in the ‘Setup MySQL WorkBench’ lesson above to set it up.
Basically, you need to run the 2 SQL scripts inside ‘Sakila-db’ directory: sakila-schema.sql, then, sakila-data.sql.

19. SQL DateTime Operations

SQL provides a set of functions for working with dates and times. These functions let you manipulate and format date and time values in your SQL queries.

1) Extract the Date and time from a DateTime
2) Extract Year, Month, and Day
3) Extract Hours, Minutes, and Seconds
4) Increment and decrement Date Column by Days, Months and Years
5) Current date, time, and timestamp
6) Calculate Date Difference in Days, Years and Months

CREATE TABLE Employees (
    ID INT PRIMARY KEY,
    Name VARCHAR(100),
    HireDate DATETIME
);

INSERT INTO Employees (ID, Name, HireDate)
VALUES 
(1, 'John Doe', '2017-03-12 09:30:10'),
(2, 'Jane Smith', '2018-05-14 15:20:08'),
(3, 'Emily Johnson', '2019-07-30 13:15:15'),
(4, 'Robert Brown', '2020-10-20 10:00:35');

Select * from Employees;

ID	Name	HireDate
1	John Doe	2017-03-12 09:30:10
2	Jane Smith	2018-05-14 15:20:08
3	Emily Johnson	2019-07-30 13:15:15
4	Robert Brown	2020-10-20 10:00:35

1) Extract the Date and time from a DateTime

You can extract the date part from a DateTime using the DATE() function.

SELECT *, 
      DATE(HireDate) AS 'Hire Date',
      TIME(HireDate) AS 'Hire Time'
FROM Employees;

ID	Name	HireDate	Hire Date	Hire Time
1	John Doe	2017-03-12 09:30:10	2017-03-12	09:30:10
2	Jane Smith	2018-05-14 15:20:08	2018-05-14	15:20:08
3	Emily Johnson	2019-07-30 13:15:15	2019-07-30	13:15:15
4	Robert Brown	2020-10-20 10:00:35	2020-10-20	10:00:35

2. Extract Year, Month, and Day

You can use the strftime('%Y', date_col), strftime('%m', date_col), and strftime('%d', date_col) functions to extract the respective date components from a DateTime value.

SELECT *, 
    strftime('%Y', HireDate) as 'HireYear',
    strftime('%m', HireDate) as 'HireMonth',
    strftime('%d', HireDate) as 'HireDay'
FROM Employees;

ID	Name	HireDate	HireYear	HireMonth	HireDay
1	John Doe	2017-03-12 09:30:10	2017	03	12
2	Jane Smith	2018-05-14 15:20:08	2018	05	14
3	Emily Johnson	2019-07-30 13:15:15	2019	07	30
4	Robert Brown	2020-10-20 10:00:35	2020	10	20

3. Extract Hours, Minutes, and Seconds

You can use the strftime('%H', date_col), strftime('%M', date_col), and strftime('%S', date_col) functions to extract the respective date components from a DateTime value.

SELECT *, 
    strftime('%H', HireDate) as 'HireHour',
    strftime('%M', HireDate) as 'HireMinute',
    strftime('%S', HireDate) as 'HireSecond'
FROM Employees;

ID	Name	HireDate	HireHour	HireMinute	HireSecond
1	John Doe	2017-03-12 09:30:10	09	30	10
2	Jane Smith	2018-05-14 15:20:08	15	20	08
3	Emily Johnson	2019-07-30 13:15:15	13	15	15
4	Robert Brown	2020-10-20 10:00:35	10	00	35

4. Increment and decrement Date Column by Days, Months and Years

SELECT *,
    DATETIME(HireDate, '+10 days') AS Add_Days,
    DATETIME(HireDate, '-1 months') AS Add_Months,
    DATETIME(HireDate, '+1 years') AS Add_Years

FROM Employees;

ID	Name	HireDate	Add_Days	Add_Months	Add_Years
1	John Doe	2017-03-12 09:30:10	2017-03-22 09:30:10	2017-02-12 09:30:10	2018-03-12 09:30:10
2	Jane Smith	2018-05-14 15:20:08	2018-05-24 15:20:08	2018-04-14 15:20:08	2019-05-14 15:20:08
3	Emily Johnson	2019-07-30 13:15:15	2019-08-09 13:15:15	2019-06-30 13:15:15	2020-07-30 13:15:15
4	Robert Brown	2020-10-20 10:00:35	2020-10-30 10:00:35	2020-09-20 10:00:35	2021-10-20 10:00:35

5. Current date, time, and timestamp: CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions will return the current date, time, and timestamp respectively.

SELECT CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP;

CURRENT_DATE	CURRENT_TIME	CURRENT_TIMESTAMP
2023-08-20	07:27:36	2023-08-20 07:27:36

6. Calculate Date Difference in Days, Years and Months

SELECT *,
round(julianday('now') - julianday(HireDate),0) As Date_Diff_Days,
round((julianday('now') - julianday(HireDate))/30.44,0) As Date_Diff_Months,
round((julianday('now') - julianday(HireDate))/365.25,0) As Date_Diff_Years
FROM Employees;

ID	Name	HireDate	Date_Diff_Days	Date_Diff_Months	Date_Diff_Years
1	John Doe	2017-03-12 09:30:10	2352.0	77.0	6.0
2	Jane Smith	2018-05-14 15:20:08	1924.0	63.0	5.0
3	Emily Johnson	2019-07-30 13:15:15	1482.0	49.0	4.0
4	Robert Brown	2020-10-20 10:00:35	1034.0	34.0	3.0

SQL Tutorial – A Simple and Intuitive Guide to the Structured Query Language

1. Why you should learn SQL?

The History of SQL

Why Learn SQL?

2. Types of SQL servers

4. What is a Database and Table?

SQL Syntax example

5. Setup Python Environment for SQL

6. Import necessary libraries and create sample data

Basic outline to follow while learning SQL in a structured manner

1. Create DB and Run SQL in Jupyter Environment

2. Data wrangling in SQL vs Python

3. Setup MySQL Workbench

Information on Sakila Database we use for Exercises

2. Create Table manually in DB

SQL Variables Types:

2b. Exercise: Create Orders table from raw data.

3. Create New Table from existing table

4. Show Tables from existing Database

5. Insert Into Select: copy data from one table to another table

6. SQL DELETE: Delete rows in a existing Dataset using where condition

7. Delete a Table

Check the DROP TABLE Command after dropping table (Students_test2) in Database

8. SQL Update : Update rows/columns in a existing Dataset

8b. Exercise 2: Updating Table

9. SQL Select : Select data / required vaiables from a dataset

10. SQL Comments

11. Limit the number of rows returned from a query

Fetching Top N Records

12. SELECT DISTINCT : Identify and Delete duplicates in a dataset

13. SQL Where: Filtering rows

13b. Exercise 3: Filtering Data

14. SQL Order By: Sort the data in ascending or descending order

15. SQL Group By : Aggregate identical data into groups

16. SQL Having: Filter rows after aggregating

17. Joining Datasets

Types of SQL Joins

Customer Table:

Orders Table:

17(a) – INNER JOIN

17(b) – FULL JOIN

17(c) – LEFT JOIN

17(d) – RIGHT JOIN

18. Exercise 4: Joining Exercise

19. SQL DateTime Operations

More Articles

How to sort multiple columns in SQL and in different directions?

How to count the number of work days between two dates?

How to compute maximum of multiple columns, aks row wise max?

How to use the GROUP BY clause on multiple columns in SQL?

How to get a list of dates between two dates in SQL?

What is the difference between CROSS JOIN and INNER JOIN in SQL?

Similar Articles

Complete Introduction to Linear Regression in R

How to implement common statistical significance tests and find the p value?

Logistic Regression – A Complete Tutorial With Examples in R

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science