Let’s understand how to create histogram in pandas and how it is useful.

Histograms are very useful in statistical analysis. Histograms are generally used to represent the frequency distribution for a numeric array, split into small equal-sized bins. As we used pandas to work with tabular data, it’s important to know how to work with histograms in a pandas dataframe.

The `pandas.dataframe.hist`

and `pandas.dataframe.plot.hist`

are two popular functions. You can use them to directly plot histograms from pandas dataframes.

In this article, you will see the different features of both these functions, the differences between them, and scenarios in which you can use them.

## Create a Sample Dataset

**Step 1: Import pandas**

```
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
```

**Step 2: Create a DataFrame**

```
df = pd.DataFrame({
'Subject_1': [70.5, 80.7, 50.4, 70.5, 80.9],
'Subject_2': [40.24, 50.9, 70.6, 80.1, 50.9],
'Subject_3': [30, 50.5, 70.8, 90.88, 30],
'School': ['School_1', 'School_1', 'School_1', 'School_2', 'School_2']
})
df
```

## Plot Histogram using .hist() function in pandas

### Default plot

In order to plot a histogram using pandas, chain the `.hist()`

function to the dataframe. This will return the histogram for each numeric column in the pandas dataframe.

*Note: Adding semicolon at the end of code suppresses text output from functions*

```
# Histogram function on dataframe
df.hist();
```

### Defining bins to control the number of bars of a histogram.

Bins are the class intervals in which our data is grouped. You can create a plot based on the number of values in each interval. By default, the `hist()`

function takes 10 bins. You can change the number of bins in two ways:

**1. Pass the number of bins**

You can directly pass the number of bins you want in your histogram.

```
# Histogram with bins=3
df.hist(bins=3);
```

**2. Pass the bins itself**

You can also pass the list of desired bins. Pandas will make bins with edges of the list values.

```
# Histogram with custom bins
df.hist(bins=[20, 35, 50, 80]);
```

### Make a histogram for a specific column.

By default, you will get a histogram for each column of your dataframe. If you want only a specific column plot, then use the `column`

parameter of the `hist()`

function. You can give the specific column name as the input to the function.

#### Get Free Complete Python Course

Facing the same situation like everyone else?

Build your data science career with a globally recognised, industry-approved qualification. Get the mindset, the confidence and the skills that make Data Scientist so valuable.

#### Get Free Complete Python Course

Build your data science career with a globally recognised, industry-approved qualification. Get the mindset, the confidence and the skills that make Data Scientist so valuable.

```
# Histogram of 'Subject_1' column
df.hist(column='Subject_1');
```

### How to Plot Histograms for different groups within a given column in pandas

`hist()`

function provides the ability to plot separate histograms in pandas for different groups of data. By using the `'by'`

parameter, you can specify the column name for which different groups should be made. This will create separate histograms for each group.

In the example below, two histograms are created for the `Subject_1`

column. The groups are created based on the `School`

column.

```
df.hist(column='Subject_1', by='School');
```

## Plotting Histogram using .plot() function in pandas instead of .hist()

Histogram can also be created by using the `plot()`

function on pandas dataframes. The main difference between the `.hist()`

and `.plot()`

functions is that the `.plot`

function creates histograms for all the numeric columns of the dataframe on the same figure. No separate plots are made in the case of the `.plot`

function.

Plot function can also take in the `bins`

and `by`

parameter same as hist function. The `plot`

function can be used for histogram plotting in two ways:

**1. By using the kind parameter**

`plot()`

function has a `kind`

parameter that takes in the kind of plot to be created. For histogram, you need to pass the value as `hist`

. Other plots have different values.

```
df.plot(kind='hist');
```

Notice that the `plot()`

function has automatically assigned colors and a legend to the plot created.

**2. By using hist method of plot function**

You can directly access the histogram `hist`

method from the `plot`

function. Just add the `.hist()`

after `.plot`

function.

```
df.plot.hist();
```

## Practical Tips

- You should use pandas for quick plotting options to get instant data visualizations. This eliminates the need to use external libraries (though these plotting functions are using matplotlib under the hood)
- You can also use other parameters offered by matplotlib such as rotate x labels, title to improve the readability of the plots.

To learn similar applications, you can check out our blog on common pandas exercises here.

## Test your knowledge

**Q1:** Which parameter in the `hist`

function is used to plot a single column histogram?

**Answer:** `column`

parameter is used to plot a single column histogram.

**Q2:** What is the use of the `kind`

parameter in the `plot`

function?

**Answer:** `kind`

parameter is used to define the type of plot. `bar`

is for bar plot, `line`

for line plot and so on. You can check the full list here

**Q3:** We can define the bins of the histogram using `interval`

parameter. True/False?

**Answer:** False. `bins`

parameter is used.

The article was contributed by Kaustubh and Shrivarsheni