Menu

Pandas Iterate Over Rows

Pandas DataFrame is used very popularly to store tabular data. Very often, we need to iterate over rows of the dataframe to perform various operations. This is a way of navigating the DatraFrame.

In this article, you will learn about some of the most popular methods which will help you access rows of the DataFrame as per the requirement. You will also get to know about a few practical tips for using these methods in various situations.

Creating a Basic DataFrame

Let us start by create a simple dataframe

# Create a DataFrame
import pandas as pd


# Create the data of the DataFrame as a dictionary
data_df = {'Name': ['LaMDA', 'GPT-3', 'BERT', 'CodeBERT', 'ELMo', 'XLNet', 'ALBERT', 'RoBERTa'],

           'Developed by': ['Google', 'OpenAI', 'Google', 'Microsoft',
                            'Allen Institute for AI', 'Google', 'Google', 'Facebook'],

           'Year Released': [2021, 2020, 2018, 2020, 2018, 2019, 2019, 2019]}

# Create the DataFrame
df = pd.DataFrame(data_df)
df
Basic Dataframe to iterate over rows in pandas

To learn more about creating and loading DataFrames, click here.

Using Index labels to iterate rows

Using a for loop, we can iterate over the rows of the DataFrame . The below example prints the ‘Name’ column value of each row of a DataFrame.

# Use index labels to iterate over the rows values
for i in range(len(df)):
    print(df['Name'][i])
LaMDA
GPT-3
BERT
CodeBERT
ELMo
XLNet
ALBERT
RoBERTa

Using the loc() method to iterate rows

The .loc() method is used to access the rows and columns of a DataFrame by using its index labels.

You can iterate over the rows of the DataFrame by specifying the row and column labels of the DataFrame with .loc() method

# Pass the index labels the rows or columns of the DataFrame to the loc() function to iterate over them
for i in range(len(df)):
    print(df.loc[i, 'Name'], df.loc[i, 'Developed by'])
LaMDA Google
GPT-3 OpenAI
BERT Google
CodeBERT Microsoft
ELMo Allen Institute for AI
XLNet Google
ALBERT Google
RoBERTa Facebook

Using the iloc() method to iterate rows

The .iloc.() method is used to access the rows and columns of the DataFrame by using their integer-value locations in the DataFrame.

Therefore, by specifying the integer value of the row and column index, you can iterate over the rows of the pandas DataFrame.

# Pass the integer-value locations of the rows or columns of the DataFrame to the iloc() function to iterate over them

for i in range(len(df)):
    print(df.iloc[i, 0], df.iloc[i, 1])
LaMDA Google
GPT-3 OpenAI
BERT Google
CodeBERT Microsoft
ELMo Allen Institute for AI
XLNet Google
ALBERT Google
RoBERTa Facebook

To learn more about accessing row or columns using the iloc()method, click here.

Using iterrows() method to iterate rows

The iterrows() method is used to iterate over the rows of the pandas DataFrame. It returns a tuple which contains the row index label and the content of the row as a pandas Series.

# Iterate over the row values using the iterrows() method

for ind, row in df.iterrows():
    print(row)
    print('\n') # Use the escape character '\n' to print an empty new line.
Name              LaMDA
Developed by     Google
Year Released      2021
Name: 0, dtype: object


Name              GPT-3
Developed by     OpenAI
Year Released      2020
Name: 1, dtype: object


Name               BERT
Developed by     Google
Year Released      2018
Name: 2, dtype: object


Name              CodeBERT
Developed by     Microsoft
Year Released         2020
Name: 3, dtype: object


Name                               ELMo
Developed by     Allen Institute for AI
Year Released                      2018
Name: 4, dtype: object


Name              XLNet
Developed by     Google
Year Released      2019
Name: 5, dtype: object


Name             ALBERT
Developed by     Google
Year Released      2019
Name: 6, dtype: object


Name              RoBERTa
Developed by     Facebook
Year Released        2019
Name: 7, dtype: object

Now, we have covered the most simple methods.

 

Using the itertuples() method

The itertuples() method iterates over the rows of a pandas DataFrame as namedtuples. When you use this method, a tuple is returned that has the row index label as the first element and the row values in the form of a pandas Series as the second element.

  • Syntax: pandas.DataFrame.itertuples(index=True, name=’Pandas’)
  • Purpose: To iterate over the rows of a DataFrame
  • Parameters:
    • index:Boolean (default: True). It is used to specify if the index label of the row should be returned or not. If the value ‘True’ is passed then the index label will be returned as the first index of the tuple.
    • name:String or None (default: ‘Pandas’). We use this to specificy the name given to the tuples.
  • Returns: An iterator
# Iterate over the row values using the itertuples() method
for row in df.itertuples():
    print(row)
Pandas(Index=0, Name='LaMDA', _2='Google', _3=2021)
Pandas(Index=1, Name='GPT-3', _2='OpenAI', _3=2020)
Pandas(Index=2, Name='BERT', _2='Google', _3=2018)
Pandas(Index=3, Name='CodeBERT', _2='Microsoft', _3=2020)
Pandas(Index=4, Name='ELMo', _2='Allen Institute for AI', _3=2018)
Pandas(Index=5, Name='XLNet', _2='Google', _3=2019)
Pandas(Index=6, Name='ALBERT', _2='Google', _3=2019)
Pandas(Index=7, Name='RoBERTa', _2='Facebook', _3=2019)

Using the iteritems() method

The iteritems() method is used to iterate over the columns of a DataFrame. When you use this method, it returns a tuple where the first element is the column label and the second element is the column values in the form of a pandas Series.

# Iterate over the column values using the iteritems() method

for col in df.iteritems():
    print('element at index 0:', col[0])
    print('element at index 1:', col[1])
element at index 0: Name
element at index 1: 0       LaMDA
1       GPT-3
2        BERT
3    CodeBERT
4        ELMo
5       XLNet
6      ALBERT
7     RoBERTa
Name: Name, dtype: object
element at index 0: Developed by
element at index 1: 0                    Google
1                    OpenAI
2                    Google
3                 Microsoft
4    Allen Institute for AI
5                    Google
6                    Google
7                  Facebook
Name: Developed by, dtype: object
element at index 0: Year Released
element at index 1: 0    2021
1    2020
2    2018
3    2020
4    2018
5    2019
6    2019
7    2019
Name: Year Released, dtype: int64

Using the items() method

The items() method is similar to the iteritems() method. This method will also return a tuple with the column name as the first element and the column values as the second element.

# Iterate over the column values using the items() method

for col in df.items():
    print('element at index 0:', col[0])
    print('element at index 1:', col[1])
element at index 0: Name
element at index 1: 0       LaMDA
1       GPT-3
2        BERT
3    CodeBERT
4        ELMo
5       XLNet
6      ALBERT
7     RoBERTa
Name: Name, dtype: object
element at index 0: Developed by
element at index 1: 0                    Google
1                    OpenAI
2                    Google
3                 Microsoft
4    Allen Institute for AI
5                    Google
6                    Google
7                  Facebook
Name: Developed by, dtype: object
element at index 0: Year Released
element at index 1: 0    2021
1    2020
2    2018
3    2020
4    2018
5    2019
6    2019
7    2019
Name: Year Released, dtype: int64

Using the apply() function

You can use the apply() function to perform a certain operation on all the values of a DataFrame.

  • Syntax: pandas.DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), kwds)
  • Purpose: To apply a function along a particular axis of the DataFrame.
  • Parameters:
    • func: It denotes the function or operation which will be applied on the DataFrame.
    • axis:0 or 1 (default: 0). This parameter specifies the orientation of the DataFrame along which the function will be applied.
    • raw:Boolean(default: False). It is used to specify if the rows or columns are to passed as a pandas Series or a numpy array. Set the value True to pass the rows or columns as a numpy array. To pass the rows or columns as a pandas Series, set the value of this parameter as False.
    • result_type:expand or reduce or broadcast or None (default: None). It decides how the series or the numpy array is to be returned.

1. The expand argument turns list-like results into different columns.

2. The reduce argument returns a pandas Series. It performs the opposite function to the expand parameter. 3. The broadcast argument is used to broadcast the result to the shape of the original DataFrame. 4. The return values of the None parameter depends on the value being returned after applying the function.

 

  • args(): It is used to specify the positional arguments to be passed to the function specified in func along with the arrays or series.
  • kwds: It is used to specify the additional keyword arguments which are to be used by the function specified in the func parameter.
  • Returns: A pandas Series or a DataFrame.

By combining the index labels method with this method, you can use the apply() function to iterate over the column values of the pandas DataFrame.

# Iterate over the column values using the apply() method

print(df.apply(lambda row: row['Name'] + ', ' + row['Developed by'], axis=1))
0                   LaMDA, Google
1                   GPT-3, OpenAI
2                    BERT, Google
3             CodeBERT, Microsoft
4    ELMo, Allen Institute for AI
5                   XLNet, Google
6                  ALBERT, Google
7               RoBERTa, Facebook
dtype: object

Practical Tips

  1. Remember that the loc() function accepts row or column labels and the iloc() function accepts integer-value locations of the rows or the columns to iterate over the pandas DataFrame.
  2. The iterrows(), iteritems(), and the items() methods return a generator object. Therefore, we cannot access these directly. You can access their values by iterating over them by using a loop.
  3. Keep in mind that the iterrows() and the itertuples() methods iterate over the row values of the DataFrame while the iteritems() and the items() methods iterate over the column values of the DataFrame.

If you want more information, you can check the pandas official documentation here.

 

Test Your Knowledge

Q1: The iterrows() method returns a list where the first element is the row label and the second element is the row values in the form of a pandas Series. True or False?

Answer:

Answer: False. The iterrows() method returns a tuple where the first element is the row label and the second element is the row values in the form of a pandas Series.

Q2: Which function or method requires row or column labels for iteration over the DataFrame?

Answer:

Answer: The loc() method

Q3: Write the code to iterate over the column values of ‘col_1’ and ‘col_2’ in the DataFrame df using the apply() function.

Answer:

Answer: print(df.apply(lambda row: row['col_1'] + ', ' + row['col_2'],axis=1))

Q4: Write the code to iterate over the row values of the DataFrame df using the itertuples() method

Answer:

Answer:

for row in df.itertuples():
print(row)

Q5: Write the code to iterate over all the rows of the second column of the Dataframe df.

Answer:

Answer:

for i in range(len(df)):
print(df.iloc[i,1])

 

This article was contributed by Shreyansh B and Shri Varsheni

 

 

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science