Pandas drop column : Different methods

To drop a single column or multiple columns from pandas dataframe in Python, you can use `df.drop` and other different methods.

During many instances, some columns are not relevant to your analysis. You should know how to drop these columns from a pandas dataframe. When building a machine learning models, columns are removed if they are redundant or doesn’t help your model.

The most common way to remove a column is using df.drop(). Sometimes, del command in Python is also used.

Creating a Basic DataFrame

To understand how to drop a column, let us start by creating a basic pandas dataframe.

import pandas as pd

# Create the data for the Dataframe
data_df = {'Name': ['Harvard', 'Yale', 'Cornell', 'Princeton', 'Dartmouth'],
           'Locations': ['Cambridge', 'New Haven', 'Ithaca', 'Princeton', 'Hanover'],
           'States': ['Massachusetts', 'Connecticut', 'New York', 'New Jersey', 'New Hampshire'],
           'Founder': ['John Harvard', 'The Founders', 'Ezra Cornell', 'John Witherspoon', 'George III'],
           'Founding Year': [1650, 1701, 1865, 1746, 1769]}

# Create the DataFrame
df = pd.DataFrame(data_df)
df
Pandas drop coloumn

Using the del command to drop coloumn

To drop a single column in a pandas dataframe, you can use the del command which is inbuilt in python.

# delete the column 'Locations'
del df['Locations']
df
By del command

Using the drop method

You can use the drop method of Dataframes to drop single or multiple columns in different ways.

pandas.DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’)

Purpose: To drop the specified rows or columns from the DataFrame.

Parameters:
     labels: single label or list (default: None). Used to specify the row or column index labels which are to be dropped.
     axis: 0 or 1 (default: 0). Specify the orientation along which the labels are to be dropped. If the value of this parameter is set to 0 then the labels will be dropped along the rows and if it is set to 1, then the labels will be dropped along the columns.
     index: single label or list (default:None). Alternative to the axis parameter for specifying the orientation along which the labels are to be dropped.
     columns: single label or list (default: None). Alternative to specifying the orientation along which the labels are to be dropped.
     level: int or level name (default: None). Specify the level in case multi-level indices are present from which the labels are to be removed.
     inplace: Boolean (default: False). Specify if a copy of the DataFrame is to be returned or not. If the value of this parameter is set to ‘False’ then a copy of the original DataFrame will be returned. If it is set to ‘True’ then the changes will be made in the original DataFrame.
     errors: ‘ignore’ or ‘raise’ (default: raise). Specify if the errors are to be raised or ignored. If the ‘ignore’ value is passed to this parameter then the error is suppressed and only the existing labels are dropped.

Dropping a single column

For dropping a single column, specify the name of that column in the label parameter.

# Drop the label 'Locations'
df.drop(labels='Locations', axis=1)
Dropping single column

Dropping multiple columns

For dropping multiple columns, pass the list of column names that are to be dropped in the label parameter.

df.drop(labels=['Locations', 'Founder'], axis=1)
Dropping multiple column

Using the columns argument

By using the columns argument, you do not need to specify the axis parameter to be 1 to remove the columns.
Passing the arguments here ensures that only column labels are targeted

# Pass the column name as the value to the columns parameter. The value of the axis parameter need not be passed.
df.drop(columns='Founder')
Using column argument

For dropping multiple columns using the columns argument, you can pass a list of column names which are to be dropped.

# Pass a list of column names to the columns parameter to drop multiple columns
df.drop(columns=['Founder', 'Locations'])
Using column argument

Dropping columns using column indices

If you run the method df.columns, then you will see an array of the column names of the DataFrame.

df.columns
Index(['Name', 'Locations', 'States', 'Founder', 'Founding Year'], dtype='object')

The elements of this array can be accessed via indexing. Therefore, you can drop columns using the column indices as well.

# Use df.columns command to drop columns via indexing
df.drop(df.columns[[1, 3]], axis=1)
Using column index

Using loc indexing

You can access rows and columns of a DataFrame using the loc indexing.
The loc indexing method accepts the names of the index labels to access them.

You need to pass the label names of both the row labels and column labels for accessing rows and columns using this method.

# Pass column names to the loc indexing method
df.drop(df.loc[:, ['Locations', 'Founder']], axis=1)
Using loc indexing

You can also pass name patterns as label names to the loc index.
Using name patterns, you can remove all the columns from a DataFrame which have the specified pattern in them.

Get Free Complete Python Course

Facing the same situation like everyone else?

Build your data science career with a globally recognised, industry-approved qualification. Get the mindset, the confidence and the skills that make Data Scientist so valuable.

Logo

Get Free Complete Python Course

Build your data science career with a globally recognised, industry-approved qualification. Get the mindset, the confidence and the skills that make Data Scientist so valuable.

df.drop(df.loc[:, df.columns[df.columns.str.startswith('F')]], axis=1)
# .startswith() is a string function which is used to check if a string starts with the specified character or not
Using loc indexing

Using iloc indexing

You can also access rows and columns of a DataFrame using the iloc indexing.
The iloc method is similar to the loc method but it accepts integer based index labels for both rows and columns instead of label names.

To learn more about accessing the rows and columns of a DataFrame using the iloc method, click here.

# Pass the integer-based index values to the iloc indexing method
df.drop(df.iloc[:, [1, 3]], axis=1)
Using iloc indexing

Using the DataFrame.columns.difference method

The DataFrame.columns.difference function is used as a negation operation to the DataFrame.columns method which is used to access the array of column names.
By using this function, you can mention the column names that you want to retain and the remaining columns will be removed.

# Pass the column names which are to be retained
df.drop(df.columns.difference(['Name', 'States', 'Founding Year']), axis=1)
Pandas drop column

Using the pop method

The pop method is used to remove the specified column from the DataFrame and return the removed column as a pandas Series.

# Pass the name of the column which is to be removed and return it as a pandas Series
founder = df.pop('Founder')
print(founder)
print('\n')  # Escape character to print an empty new line
print(df)
Using pop method

Practical Tips

  1. Make sure that while using the drop method, if the columns parameter is not specified then the value of the axis parameter should be set to 1.
  2. The del command can be used to remove single columns but not multiple columns.
  3. You can use slice objects to pass the column labels which are contiguous.

Conclusion

In this article, you learnt how to drop columns using the methods:

  1. del command
  2. drop method
  3. DataFrame.columns.difference method
  4. pop method

Test Your Knowledge

Q1: The pop function removes the specified column from the DataFrame, and returns the DataFrame. True or False?

Answer
False. The pop function removes the specified column from the DataFrame, and returns column as a pandas Series.

Q2: Which function is the inbuilt function in Python that is used to drop columns from a pandas Dataframe?

Answer

The del command

Q3: Identify the error in the code and write the code for the following:

df.drop(labels=['col_A', 'col_B'],axis=0)

Answer

df.drop(labels=['col_A', 'col_B'],axis=1)

Q4: You have a DataFrame df which has three columns: ‘col_A’, ‘col_B’ and ‘col_c’. Write the code to remove the column ‘col_C’ and return it as a pandas       Series ‘ser_col_c’

Answer
ser_col_c = df.pop('col_C')

Q5: You have a DataFrame df which has three columns: ‘col_A’, ‘col_B’ and ‘col_c’. Write the code to remove the column ‘col_A’ and ‘col_B’ using the loc function. Make sure that the columns are removed in the same DataFrame and a copy of the DataFrame is not formed

Answer
df.drop(df.loc[:,['col_A', 'col_B']],axis=1,inplace=True)

 

This article was contributed by Shreyansh.

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science