Traversal is a required operation in many programming languages, such as Python, which iterates over list structures through the for loop. How do Pandas traverse Series and DataFrame objects? We should make it clear that they have different types of data structures and then different traversal methods. This article will tell you the difference between Series and DataFrame and show you examples of how to traverse them.
1. How To Iterate Over Pandas DataFrame Columns.
- For a Series object, you can traverse it as a one-dimensional array. For the DataFrame object which has a two-dimensional data table structure, it is similar to traversing a python dictionary.
- Pandas use the for loop for data traversal. After traversing with the for loop, Series gets the value directly, while DataFrame gets the column label, then you can get the column label related Series object. below is an example.
import pandas as pd import numpy as np def create_example_dataframe_object(): # create a 1 dimensional array with numbers. array = np.arange(15) print('the original array : ') print(array) print('\r\n') # convert the 1 dimensional array to a 2 dimensional array that has 5 rows and 3 columns. array_5_rows_3_columns = array.reshape(5, 3) print('reshape the original array to 5 rows & 3 columns.\n\r') print(array_5_rows_3_columns) print('\r\n') # create the DataFrame object based on the above 2D array. df = pd.DataFrame(array_5_rows_3_columns, columns=['python', 'java', 'javascript']) print('the DataFrame object created by the above 2 dimensional array : \r\n') print(df) print('\r\n') return df def pandas_dataframe_iterate_by_column(): df_obj = create_example_dataframe_object() for col in df_obj: # get the column label related Series object. value = df_obj[col] print('\ncol: ', col) print('type(col): ', type(col)) print('df_obj[col]: ', value) print('type(df_obj[col]): ', type(value)) if __name__ == '__main__': pandas_dataframe_iterate_by_column()
- Below is the above example execution result.
the original array : [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14] reshape the original array to 5 rows & 3 columns. [[ 0 1 2] [ 3 4 5] [ 6 7 8] [ 9 10 11] [12 13 14]] the DataFrame object created by the above 2 dimensional array : python java javascript 0 0 1 2 1 3 4 5 2 6 7 8 3 9 10 11 4 12 13 14 col: python type(col): <class 'str'> df_obj[col]: 0 0 1 3 2 6 3 9 4 12 Name: python, dtype: int32 type(df_obj[col]): <class 'pandas.core.series.Series'> col: java type(col): <class 'str'> df_obj[col]: 0 1 1 4 2 7 3 10 4 13 Name: java, dtype: int32 type(df_obj[col]): <class 'pandas.core.series.Series'> col: javascript type(col): <class 'str'> df_obj[col]: 0 2 1 5 2 8 3 11 4 14 Name: javascript, dtype: int32 type(df_obj[col]): <class 'pandas.core.series.Series'>
2. How To Iterate Over Pandas DataFrame Rows.
- Pandas DataFrame provides 3 methods for us to iterate over it’s rows.
- iteritems(): iterate in the form of key, value pairs.
- iterrows(): iterate the rows in the form of (row_index, row).
- itertuples(): iterate rows using named tuples.
- The below example shows how to use the above 3 methods.
import pandas as pd import numpy as np def pandas_dataframe_iterate_by_row(): df_obj = create_example_dataframe_object() print('---------- dataframe iteritems() example ----------') for key,value in df_obj.iteritems(): print ('\nkey: ', key) print ('value: ', value) print('\n---------- dataframe iterrows() example ----------') for row_index,row in df_obj.iterrows(): print ('\nrow_index: ', row_index) print ('row: ', row) print('\n---------- dataframe itertuples() example ----------') for row in df_obj.itertuples(): print(row) if __name__ == '__main__': pandas_dataframe_iterate_by_row()
- Below is the above example execution output.
the original array : [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14] reshape the original array to 5 rows & 3 columns. [[ 0 1 2] [ 3 4 5] [ 6 7 8] [ 9 10 11] [12 13 14]] the DataFrame object created by the above 2 dimensional array : python java javascript 0 0 1 2 1 3 4 5 2 6 7 8 3 9 10 11 4 12 13 14 ---------- dataframe iteritems() example ---------- key: python value: 0 0 1 3 2 6 3 9 4 12 Name: python, dtype: int32 key: java value: 0 1 1 4 2 7 3 10 4 13 Name: java, dtype: int32 key: javascript value: 0 2 1 5 2 8 3 11 4 14 Name: javascript, dtype: int32 ---------- dataframe iterrows() example ---------- row_index: 0 row: python 0 java 1 javascript 2 Name: 0, dtype: int32 row_index: 1 row: python 3 java 4 javascript 5 Name: 1, dtype: int32 row_index: 2 row: python 6 java 7 javascript 8 Name: 2, dtype: int32 row_index: 3 row: python 9 java 10 javascript 11 Name: 3, dtype: int32 row_index: 4 row: python 12 java 13 javascript 14 Name: 4, dtype: int32 ---------- dataframe itertuples() example ---------- Pandas(Index=0, python=0, java=1, javascript=2) Pandas(Index=1, python=3, java=4, javascript=5) Pandas(Index=2, python=6, java=7, javascript=8) Pandas(Index=3, python=9, java=10, javascript=11) Pandas(Index=4, python=12, java=13, javascript=14)