Pandas DataFrame is used very popularly to store tabular data. Very often, we need to iterate over rows of the dataframe to perform various operations. This is a way of navigating the DatraFrame.
In this article, you will learn about some of the most popular methods which will help you access rows of the DataFrame as per the requirement. You will also get to know about a few practical tips for using these methods in various situations.
Creating a Basic DataFrame
Let us start by create a simple dataframe
# Create a DataFrame
import pandas as pd
# Create the data of the DataFrame as a dictionary
data_df = {'Name': ['LaMDA', 'GPT-3', 'BERT', 'CodeBERT', 'ELMo', 'XLNet', 'ALBERT', 'RoBERTa'],
'Developed by': ['Google', 'OpenAI', 'Google', 'Microsoft',
'Allen Institute for AI', 'Google', 'Google', 'Facebook'],
'Year Released': [2021, 2020, 2018, 2020, 2018, 2019, 2019, 2019]}
# Create the DataFrame
df = pd.DataFrame(data_df)
df

To learn more about creating and loading DataFrames, click here.
Using Index labels to iterate rows
Using a for loop, we can iterate over the rows of the DataFrame . The below example prints the ‘Name’ column value of each row of a DataFrame.
# Use index labels to iterate over the rows values
for i in range(len(df)):
print(df['Name'][i])
LaMDA
GPT-3
BERT
CodeBERT
ELMo
XLNet
ALBERT
RoBERTa
Using the loc() method to iterate rows
The .loc()
method is used to access the rows and columns of a DataFrame by using its index labels.
You can iterate over the rows of the DataFrame by specifying the row and column labels of the DataFrame with .loc()
method
# Pass the index labels the rows or columns of the DataFrame to the loc() function to iterate over them
for i in range(len(df)):
print(df.loc[i, 'Name'], df.loc[i, 'Developed by'])
LaMDA Google
GPT-3 OpenAI
BERT Google
CodeBERT Microsoft
ELMo Allen Institute for AI
XLNet Google
ALBERT Google
RoBERTa Facebook
Using the iloc() method to iterate rows
The .iloc.()
method is used to access the rows and columns of the DataFrame by using their integer-value locations in the DataFrame.
Therefore, by specifying the integer value of the row and column index, you can iterate over the rows of the pandas DataFrame.
# Pass the integer-value locations of the rows or columns of the DataFrame to the iloc() function to iterate over them
for i in range(len(df)):
print(df.iloc[i, 0], df.iloc[i, 1])
LaMDA Google
GPT-3 OpenAI
BERT Google
CodeBERT Microsoft
ELMo Allen Institute for AI
XLNet Google
ALBERT Google
RoBERTa Facebook
To learn more about accessing row or columns using the iloc()
method, click here.
Using iterrows() method to iterate rows
The iterrows()
method is used to iterate over the rows of the pandas DataFrame. It returns a tuple which contains the row index label and the content of the row as a pandas Series.
# Iterate over the row values using the iterrows() method
for ind, row in df.iterrows():
print(row)
print('\n') # Use the escape character '\n' to print an empty new line.
Name LaMDA
Developed by Google
Year Released 2021
Name: 0, dtype: object
Name GPT-3
Developed by OpenAI
Year Released 2020
Name: 1, dtype: object
Name BERT
Developed by Google
Year Released 2018
Name: 2, dtype: object
Name CodeBERT
Developed by Microsoft
Year Released 2020
Name: 3, dtype: object
Name ELMo
Developed by Allen Institute for AI
Year Released 2018
Name: 4, dtype: object
Name XLNet
Developed by Google
Year Released 2019
Name: 5, dtype: object
Name ALBERT
Developed by Google
Year Released 2019
Name: 6, dtype: object
Name RoBERTa
Developed by Facebook
Year Released 2019
Name: 7, dtype: object
Now, we have covered the most simple methods.
Using the itertuples() method
The itertuples()
method iterates over the rows of a pandas DataFrame as namedtuples. When you use this method, a tuple is returned that has the row index label as the first element and the row values in the form of a pandas Series as the second element.
- Syntax: pandas.DataFrame.itertuples(index=True, name=’Pandas’)
- Purpose: To iterate over the rows of a DataFrame
- Parameters:
- index:Boolean (default: True). It is used to specify if the index label of the row should be returned or not. If the value ‘True’ is passed then the index label will be returned as the first index of the tuple.
- name:String or None (default: ‘Pandas’). We use this to specificy the name given to the tuples.
- Returns: An iterator
# Iterate over the row values using the itertuples() method
for row in df.itertuples():
print(row)
Pandas(Index=0, Name='LaMDA', _2='Google', _3=2021)
Pandas(Index=1, Name='GPT-3', _2='OpenAI', _3=2020)
Pandas(Index=2, Name='BERT', _2='Google', _3=2018)
Pandas(Index=3, Name='CodeBERT', _2='Microsoft', _3=2020)
Pandas(Index=4, Name='ELMo', _2='Allen Institute for AI', _3=2018)
Pandas(Index=5, Name='XLNet', _2='Google', _3=2019)
Pandas(Index=6, Name='ALBERT', _2='Google', _3=2019)
Pandas(Index=7, Name='RoBERTa', _2='Facebook', _3=2019)
Using the iteritems() method
The iteritems()
method is used to iterate over the columns of a DataFrame. When you use this method, it returns a tuple where the first element is the column label and the second element is the column values in the form of a pandas Series.
# Iterate over the column values using the iteritems() method
for col in df.iteritems():
print('element at index 0:', col[0])
print('element at index 1:', col[1])
element at index 0: Name
element at index 1: 0 LaMDA
1 GPT-3
2 BERT
3 CodeBERT
4 ELMo
5 XLNet
6 ALBERT
7 RoBERTa
Name: Name, dtype: object
element at index 0: Developed by
element at index 1: 0 Google
1 OpenAI
2 Google
3 Microsoft
4 Allen Institute for AI
5 Google
6 Google
7 Facebook
Name: Developed by, dtype: object
element at index 0: Year Released
element at index 1: 0 2021
1 2020
2 2018
3 2020
4 2018
5 2019
6 2019
7 2019
Name: Year Released, dtype: int64
Using the items() method
The items()
method is similar to the iteritems()
method. This method will also return a tuple with the column name as the first element and the column values as the second element.
# Iterate over the column values using the items() method
for col in df.items():
print('element at index 0:', col[0])
print('element at index 1:', col[1])
element at index 0: Name
element at index 1: 0 LaMDA
1 GPT-3
2 BERT
3 CodeBERT
4 ELMo
5 XLNet
6 ALBERT
7 RoBERTa
Name: Name, dtype: object
element at index 0: Developed by
element at index 1: 0 Google
1 OpenAI
2 Google
3 Microsoft
4 Allen Institute for AI
5 Google
6 Google
7 Facebook
Name: Developed by, dtype: object
element at index 0: Year Released
element at index 1: 0 2021
1 2020
2 2018
3 2020
4 2018
5 2019
6 2019
7 2019
Name: Year Released, dtype: int64
Using the apply() function
You can use the apply()
function to perform a certain operation on all the values of a DataFrame.
- Syntax: pandas.DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), kwds)
- Purpose: To apply a function along a particular axis of the DataFrame.
- Parameters:
- func: It denotes the function or operation which will be applied on the DataFrame.
- axis:0 or 1 (default: 0). This parameter specifies the orientation of the DataFrame along which the function will be applied.
- raw:Boolean(default: False). It is used to specify if the rows or columns are to passed as a pandas Series or a numpy array. Set the value True to pass the rows or columns as a numpy array. To pass the rows or columns as a pandas Series, set the value of this parameter as False.
- result_type:expand or reduce or broadcast or None (default: None). It decides how the series or the numpy array is to be returned.
1. The expand argument turns list-like results into different columns.
2. The reduce argument returns a pandas Series. It performs the opposite function to the expand parameter. 3. The broadcast argument is used to broadcast the result to the shape of the original DataFrame. 4. The return values of the None parameter depends on the value being returned after applying the function.
- args(): It is used to specify the positional arguments to be passed to the function specified in func along with the arrays or series.
- kwds: It is used to specify the additional keyword arguments which are to be used by the function specified in the func parameter.
- Returns: A pandas Series or a DataFrame.
By combining the index labels
method with this method, you can use the apply()
function to iterate over the column values of the pandas DataFrame.
# Iterate over the column values using the apply() method
print(df.apply(lambda row: row['Name'] + ', ' + row['Developed by'], axis=1))
0 LaMDA, Google
1 GPT-3, OpenAI
2 BERT, Google
3 CodeBERT, Microsoft
4 ELMo, Allen Institute for AI
5 XLNet, Google
6 ALBERT, Google
7 RoBERTa, Facebook
dtype: object
Practical Tips
- Remember that the
loc()
function accepts row or column labels and theiloc()
function accepts integer-value locations of the rows or the columns to iterate over the pandas DataFrame. - The
iterrows()
,iteritems()
, and theitems()
methods return a generator object. Therefore, we cannot access these directly. You can access their values by iterating over them by using a loop. - Keep in mind that the
iterrows()
and theitertuples()
methods iterate over the row values of the DataFrame while theiteritems()
and theitems()
methods iterate over the column values of the DataFrame.
If you want more information, you can check the pandas official documentation here.
Test Your Knowledge
Q1: The iterrows()
method returns a list where the first element is the row label and the second element is the row values in the form of a pandas Series. True or False?
Answer: False. The iterrows()
method returns a tuple where the first element is the row label and the second element is the row values in the form of a pandas Series.
Q2: Which function or method requires row or column labels for iteration over the DataFrame?
Answer:Answer: The loc() method
Q3: Write the code to iterate over the column values of ‘col_1’ and ‘col_2’ in the DataFrame df using the apply() function.
Answer:Answer: print(df.apply(lambda row: row['col_1'] + ', ' + row['col_2'],axis=1))
Q4: Write the code to iterate over the row values of the DataFrame df using the itertuples() method
Answer:Answer:
for row in df.itertuples():
print(row)
Q5: Write the code to iterate over all the rows of the second column of the Dataframe df.
Answer:Answer:
for i in range(len(df)):
print(df.iloc[i,1])
This article was contributed by Shreyansh B and Shri Varsheni