Pandas DataFrame Reindex Example

This article will tell you what is DataFrame reindex and why we need to reindex and how to reindex with examples.

1. Why Reindex?

  1. By resetting the index, you can complete the reordering of existing data. If the reset index tag does not exist in the original DataFrame, the element values corresponding to the tag will be filled with NaN.
  2. Reindex can change the row labels or column labels of the original DataFrame object, and make the changed row and column labels match the data in the DataFrame one by one.

2. Reset Row And Column Labels.

  1. There are 2 examples in the below source code.
  2. The first example shows how to reindex one DataFrame object to filter out the wanted rows and columns, if the reindexed column does not exist, then it will use NaN to set the column value.
  3. The second example shows how to reindex one DataFrame object rows & columns based on another DataFrame object. But it requires the two DataFrame objects have the same columns.
  4. Below is the example source code.
    import pandas as pd
    
    def reset_row_column_index_label():
        
        # define a 2D array to store user account information.
        users_array = [['Tom', 'Developer', 10000],['Bob', 'QA', 12000],['Jerry', 'Manager', 13000]]
        # define the columns array.
        columns_array = ['Name','Title','Salary']
        # create the DataFrame object.
        df = pd.DataFrame(data = users_array,columns = columns_array)
         
        # print out the source DataFrame object information.
        print('source DataFrame object df: ')
        print(df)
        print('\r\n')
        
        # re-index the DataFrame object's rows and columns. 
        # it will filter out the row with index 0 and 2, because the row 3 does not exist, so it will set NaN to all the row 3 values.
        # and the column Department does not exist, so it will use NaN to set the Department column value.
        df_reindexed = df.reindex(index=[0,2,3], columns=['Name', 'Department', 'Title'])
        
        print("df.reindex(index=[0,2], columns=['Name', 'Department', 'Title']): ")
        print(df_reindexed)
        print('\r\n')
        
        # create another 2D array with only 2 rows.
        users_array_1 = [['Kevin', 'CEO', 11000],['Richard', 'Developer',8000]]
        # create the DataFrame object based on the users_array_1.
        df_1 = pd.DataFrame(data = users_array_1,columns = columns_array)
        print('df_1:')
        print('\n')
        print(df_1)
        print('\r\n')
        
        # call the source DataFrame object's reindex_like() method to align df by df_1, so the df_2 only contains row 0 and row 1 in df.
        df_2 = df.reindex_like(df_1)
        print('df.reindex_like(df_1): ')
        print('\r\n')
        print(df_2)
    
    if __name__ == '__main__':
        
        reset_row_column_index_label()
  5. When you run the above example source code, you will get the below output.
    source DataFrame object df: 
        Name      Title  Salary
    0    Tom  Developer   10000
    1    Bob         QA   12000
    2  Jerry    Manager   13000
    
    
    df.reindex(index=[0,2], columns=['Name', 'Department', 'Title']):
        Name  Department      Title
    0    Tom         NaN  Developer
    2  Jerry         NaN    Manager
    3    NaN         NaN        NaN
    
    
    df_1:
          Name      Title  Salary
    0    Kevin        CEO   11000
    1  Richard  Developer    8000
    
    
    df.reindex_like(df_1):
      Name      Title  Salary
    0  Tom  Developer   10000
    1  Bob         QA   12000
    

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.