Let’s understand how to create histogram in pandas and how it is useful.
Histograms are very useful in statistical analysis. Histograms are generally used to represent the frequency distribution for a numeric array, split into small equal-sized bins. As we used pandas to work with tabular data, it’s important to know how to work with histograms in a pandas dataframe.
The pandas.dataframe.hist
and pandas.dataframe.plot.hist
are two popular functions. You can use them to directly plot histograms from pandas dataframes.
In this article, you will see the different features of both these functions, the differences between them, and scenarios in which you can use them.
Create a Sample Dataset
Step 1: Import pandas
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
Step 2: Create a DataFrame
df = pd.DataFrame({
'Subject_1': [70.5, 80.7, 50.4, 70.5, 80.9],
'Subject_2': [40.24, 50.9, 70.6, 80.1, 50.9],
'Subject_3': [30, 50.5, 70.8, 90.88, 30],
'School': ['School_1', 'School_1', 'School_1', 'School_2', 'School_2']
})
df

Plot Histogram using .hist() function in pandas
Default plot
In order to plot a histogram using pandas, chain the .hist()
function to the dataframe. This will return the histogram for each numeric column in the pandas dataframe.
Note: Adding semicolon at the end of code suppresses text output from functions
# Histogram function on dataframe
df.hist();

Defining bins to control the number of bars of a histogram.
Bins are the class intervals in which our data is grouped. You can create a plot based on the number of values in each interval. By default, the hist()
function takes 10 bins. You can change the number of bins in two ways:
1. Pass the number of bins
You can directly pass the number of bins you want in your histogram.
# Histogram with bins=3
df.hist(bins=3);

2. Pass the bins itself
You can also pass the list of desired bins. Pandas will make bins with edges of the list values.
# Histogram with custom bins
df.hist(bins=[20, 35, 50, 80]);

Make a histogram for a specific column.
By default, you will get a histogram for each column of your dataframe. If you want only a specific column plot, then use the column
parameter of the hist()
function. You can give the specific column name as the input to the function.
# Histogram of 'Subject_1' column
df.hist(column='Subject_1');

How to Plot Histograms for different groups within a given column in pandas
hist()
function provides the ability to plot separate histograms in pandas for different groups of data. By using the 'by'
parameter, you can specify the column name for which different groups should be made. This will create separate histograms for each group.
In the example below, two histograms are created for the Subject_1
column. The groups are created based on the School
column.
df.hist(column='Subject_1', by='School');

Plotting Histogram using .plot() function in pandas instead of .hist()
Histogram can also be created by using the plot()
function on pandas dataframes. The main difference between the .hist()
and .plot()
functions is that the .plot
function creates histograms for all the numeric columns of the dataframe on the same figure. No separate plots are made in the case of the .plot
function.
Plot function can also take in the bins
and by
parameter same as hist function. The plot
function can be used for histogram plotting in two ways:
1. By using the kind
parameter
plot()
function has a kind
parameter that takes in the kind of plot to be created. For histogram, you need to pass the value as hist
. Other plots have different values.
df.plot(kind='hist');

Notice that the plot()
function has automatically assigned colors and a legend to the plot created.
2. By using hist
method of plot
function
You can directly access the histogram hist
method from the plot
function. Just add the .hist()
after .plot
function.
df.plot.hist();

Practical Tips
- You should use pandas for quick plotting options to get instant data visualizations. This eliminates the need to use external libraries (though these plotting functions are using matplotlib under the hood)
- You can also use other parameters offered by matplotlib such as rotate x labels, title to improve the readability of the plots.
To learn similar applications, you can check out our blog on common pandas exercises here.
Test your knowledge
Q1: Which parameter in the hist
function is used to plot a single column histogram?
Answer: column
parameter is used to plot a single column histogram.
Q2: What is the use of the kind
parameter in the plot
function?
Answer: kind
parameter is used to define the type of plot. bar
is for bar plot, line
for line plot and so on. You can check the full list here
Q3: We can define the bins of the histogram using interval
parameter. True/False?
Answer: False. bins
parameter is used.
The article was contributed by Kaustubh and Shrivarsheni