Pandas Histogram

Let’s understand how to create histogram in pandas and how it is useful.

Histograms are very useful in statistical analysis. Histograms are generally used to represent the frequency distribution for a numeric array, split into small equal-sized bins. As we used pandas to work with tabular data, it’s important to know how to work with histograms in a pandas dataframe.

The pandas.dataframe.hist and pandas.dataframe.plot.hist are two popular functions. You can use them to directly plot histograms from pandas dataframes.

In this article, you will see the different features of both these functions, the differences between them, and scenarios in which you can use them.

Create a Sample Dataset

Step 1: Import pandas

import pandas as pd
import numpy as np
import warnings

warnings.filterwarnings("ignore")


Step 2: Create a DataFrame

df = pd.DataFrame({
'Subject_1': [70.5, 80.7, 50.4, 70.5, 80.9],
'Subject_2': [40.24, 50.9, 70.6, 80.1, 50.9],
'Subject_3': [30, 50.5, 70.8, 90.88, 30],
'School': ['School_1', 'School_1', 'School_1', 'School_2', 'School_2']
})

df


Plot Histogram using .hist() function in pandas

Default plot

In order to plot a histogram using pandas, chain the .hist() function to the dataframe. This will return the histogram for each numeric column in the pandas dataframe.

Note: Adding semicolon at the end of code suppresses text output from functions

# Histogram function on dataframe
df.hist();

Defining bins to control the number of bars of a histogram.

Bins are the class intervals in which our data is grouped. You can create a plot based on the number of values in each interval. By default, the hist() function takes 10 bins. You can change the number of bins in two ways:

1. Pass the number of bins

You can directly pass the number of bins you want in your histogram.

# Histogram with bins=3

df.hist(bins=3);


2. Pass the bins itself

You can also pass the list of desired bins. Pandas will make bins with edges of the list values.

# Histogram with custom bins

df.hist(bins=[20, 35, 50, 80]);


Make a histogram for a specific column.

By default, you will get a histogram for each column of your dataframe. If you want only a specific column plot, then use the column parameter of the hist()function. You can give the specific column name as the input to the function.

Want to become awesome in ML?

Hi! I am Selva, and I am excited you are reading this!
You can now go from a complete beginner to a Data Science expert, with my end-to-end free Data Science training.
No shifting between multiple books and courses. Hop on to the most effective way to becoming the expert. (Includes downloadable notebooks, portfolio projects and exercises)

Start free with the first course 'Foundations of Machine Learning' - a well rounded orientation of what the field of ML is all about.

Enroll to the Foundations of ML Course (FREE)

# Histogram of 'Subject_1' column
df.hist(column='Subject_1');

How to Plot Histograms for different groups within a given column in pandas

hist() function provides the ability to plot separate histograms in pandas for different groups of data. By using the 'by' parameter, you can specify the column name for which different groups should be made. This will create separate histograms for each group.

In the example below, two histograms are created for the Subject_1 column. The groups are created based on the School column.

df.hist(column='Subject_1', by='School');


Plotting Histogram using .plot() function in pandas instead of .hist()

Histogram can also be created by using the plot() function on pandas dataframes. The main difference between the .hist() and .plot() functions is that the .plot function creates histograms for all the numeric columns of the dataframe on the same figure. No separate plots are made in the case of the .plot function.

Plot function can also take in the bins and by parameter same as hist function. The plot function can be used for histogram plotting in two ways:

1. By using the kind parameter

plot() function has a kind parameter that takes in the kind of plot to be created. For histogram, you need to pass the value as hist. Other plots have different values.

df.plot(kind='hist');


Notice that the plot() function has automatically assigned colors and a legend to the plot created.

2. By using hist method of plot function

You can directly access the histogram hist method from the plot function. Just add the .hist() after .plot function.

df.plot.hist();


Practical Tips

1. You should use pandas for quick plotting options to get instant data visualizations. This eliminates the need to use external libraries (though these plotting functions are using matplotlib under the hood)
2. You can also use other parameters offered by matplotlib such as rotate x labels, title to improve the readability of the plots.

To learn similar applications, you can check out our blog on common pandas exercises here.

Q1: Which parameter in the hist function is used to plot a single column histogram?

Answer: column parameter is used to plot a single column histogram.

Q2: What is the use of the kind parameter in the plot function?

Answer: kind parameter is used to define the type of plot. bar is for bar plot, line for line plot and so on. You can check the full list here

Q3: We can define the bins of the histogram using interval parameter. True/False?

Answer: False. bins parameter is used.

The article was contributed by Kaustubh and Shrivarsheni

Course Preview