Menu

Pandas Dateframe Index

Introduction

In a dataframe, we have huge number of data records. There must be something unique for each data record so that we can access it distinctly. You can use the pandas dataframe index for this. They are referred as row names or index names also in general.

By default, these row index labels are integers ranging from zero to one less than the total number of rows in the pandas DataFrame. However, user-defined row labels can also be used to identify the rows of the pandas DataFrame.

In this article, you will learn about the different methods of accessing the row index labels of a pandas DataFrame followed by some practical tips of using them. 

Creating a pandas DataFrame

Let’s create a simple pandas dataframe to undertsand the concept of index.

# Create a DataFrame
import pandas as pd

# Create the data of the DataFrame as a dictionary
data_df = {'Name': ['Geoffrey Hinton', 'Ian Goodfellow', 'Ruslan Salakhutdinov', 'Yann Lecun', 'Yoshua Bengio,',
                    'Jurgen Schmidhuber', 'Sepp Hochreiter', 'Michael Jordan', 'Ilya Sutskever', 'Andrej Karpathy'],

           'Contribution': ['ANN', 'GAN', 'RBM', 'CNN', 'ANN',
                            'LSTM', 'LSTM', 'Bayesian Networks', 'TensorFlow', 'Computer Vision'],

           'Currently at': ['Google Brain', 'Apple', 'Apple', 'Facebook', 'MILA',
                            'IDSIA', 'JKU', 'UC Berkeley', 'OpenAI', 'Tesla']}


# Create the DataFrame
df = pd.DataFrame(data_df)
df
Basic Dataframe for rows or index names in pandas

Access index using the DataFrame.index method

To access the row labels use the command DataFrame.index.

# Use df.index to view the row indices
print(df.index)
RangeIndex(start=0, stop=10, step=1)

Here, the above output states that the indices are a range of integers that starts from zero and stops before ten.
To view the actual row labels, print the indices as a list.

# Print the index values as a list
print(list(df.index))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Using the DataFrame.index.values method

To access the values of the row indices as an array, use the command DataFrame.index.values.

# Print the index values as an array
print(df.index.values)
[0 1 2 3 4 5 6 7 8 9]

Iterating over the indices

You can use a loop to iterate over the indices. Let’s see how to print each row label in a new line using a for loop.

# Use a for loop to print all the values of the row labels
for index in df.index:
    print(index)
0
1
2
3
4
5
6
7
8
9

Using the tolist() method

The .tolist() method is used to convert arrays and pandas objects to a list.

You can use this method if you wish to access the row labels as a list.

# Access the row labels as a list
df.index.values.tolist()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Using the DataFrame.axes method

The DataFrame.axes method can be used to access both rows and column labels. Pass the value 0 as the argument to access the row labels using this method.

# Pass the argument 0 to the axes method to see the row labels
df.axes[0]
RangeIndex(start=0, stop=10, step=1)

You can also use this method in conjunction with the tolist() method to print the values of the row labels as a list.

# View the row labels as a list of values by using axes method
df.axes[0].tolist()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Accessing row labels of a pandas dataframe based on specific conditions

You can also set some criteria for viewing the row labels of only those rows that meet that criteria.
In this case, if you wish to see only those row labels where the values of the Currently at column is ‘Apple’, you can do so in the following manner:

# Access only those row labels where the value of the Currently at column is 'Apple'
print(df.index[df['Currently at'] == 'Apple'])
Int64Index([1, 2], dtype='int64')

Passing a list of user-defined row indices

If you wish to pass a list of custom row labels, you can do so by passing the values as a list to the index parameter while creating the pandas DataFrame.

Let’s understand this using the below example. We have created a list named ‘ind’ that has the custom row labels of Nationalities and the data is in the form a dictionary.

# Create a DataFrame
import pandas as pd

# Create the data of the DataFrame as a dictionary
data_df = {'Name': ['Geoffrey Hinton', 'Ian Goodfellow', 'Ruslan Salakhutdinov', 'Yann Lecun', 'Yoshua Bengio,',
                    'Jurgen Schmidhuber', 'Sepp Hochreiter', 'Michael Jordan', 'Ilya Sutskever', 'Andrej Karpathy'],

           'Contribution': ['ANN', 'GAN', 'RBM', 'CNN', 'ANN',
                            'LSTM', 'LSTM', 'Bayesian Networks', 'TensorFlow', 'Computer Vision'],

           'Currently at': ['Google Brain', 'Apple', 'Apple', 'Facebook', 'MILA',
                            'IDSIA', 'JKU', 'UC Berkeley', 'OpenAI', 'Tesla']}


# Create the user-defined list as a list. Make sure there are no duplica
ind = ['British', 'American', 'Russian', 'French', 'Canadian',
       'German', 'German', 'American', 'Canadian', 'Slovak']


# Create the DataFrame
df = pd.DataFrame(data_df, index=ind)

print(df)
                          Name       Contribution  Currently at
British        Geoffrey Hinton                ANN  Google Brain
American        Ian Goodfellow                GAN         Apple
Russian   Ruslan Salakhutdinov                RBM         Apple
French              Yann Lecun                CNN      Facebook
Canadian        Yoshua Bengio,                ANN          MILA
German      Jurgen Schmidhuber               LSTM         IDSIA
German         Sepp Hochreiter               LSTM           JKU
American        Michael Jordan  Bayesian Networks   UC Berkeley
Canadian        Ilya Sutskever         TensorFlow        OpenAI
Slovak         Andrej Karpathy    Computer Vision         Tesla

Now we have created the pandas dataframe as desired. Next, let us access records using the custom index names or row labels

# Access the user-defined row labels
print('User-defined index:', df.index)
User-defined index: Index(['British', 'American', 'Russian', 'French', 'Canadian', 'German',
       'German', 'American', 'Canadian', 'Slovak'],
      dtype='object')

Practical Tips

  1. Keep in mind that the DataFrame.index method returns an index object and not a Series object .
print(df.index)
print('df.index type:', type(df.index))
RangeIndex(start=0, stop=10, step=1)
df.index type: <class 'pandas.core.indexes.range.RangeIndex'>

 

  • The pandas DataFrame.index.values method returns a numpy array object.

 

print(df.index.values)
print('df.index.values type:', type(df.index.values))
[0 1 2 3 4 5 6 7 8 9]
df.index.values type: <class 'numpy.ndarray'>
  • Although duplicate values are allowed to be used as row index in pandas, it is generally not recommended to do so as duplicate row labels make it difficult to differentiate between two different rows

For more information, you can check out the pandas official documentation

Test Your Knowledge

Q1: Duplicate values in user-defined row labels is allowed. True or False?

Answer:

Answer: True

Q2: Which method is used to return the row indices as a numpy array?

Answer:

Answer: DataFrame.index.values

Q3: Find out the error in the given code and write the correct code to print the row labels:

df.axes[1].tolist

Answer:

Answer: df.axes[0].tolist()

Q4: You have the following DataFrame df:

Write the code to view only those row labels where the type is ‘Part-time Employee’

Answer:

Answer: df.index[df['type']=='Part-time Employee']

This article was contributed by Shreyansh B and Shri Varsheni

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science