Pandas User Defined Functions Examples

If you want to apply custom functions or apply functions from other libraries to pandas objects, you can use the below three methods. 1). Use the pipe() function to operate on the entire pandas’ DataFrame object. 2). Use the apply() function to operate on the pandas’ DataFrame object’s rows or columns. 3). Use the applymap() function to operate on a single element of the pandas’ DataFrame object. This article will tell you how to use them.

1. The Example DataFrame Data Sheet.

  1. The create_example_dataframe_object() method in this example will create the DataFrame object which will be used in the later example methods.
    import pandas as pd
    
    import numpy as np
    
    def create_example_dataframe_object():
        
        # create a 1 dimensional array with numbers.
        array = np.arange(15)
        print('the original array : ')
        print(array)
        
        print('\r\n')
        
        # convert the  1 dimensional array to a 2 dimensional array that has 5 rows and 3 columns.
        array_5_rows_3_columns = array.reshape(5, 3)
        
        print('reshape the original array to 5 rows & 3 columns.\n\r')
        
        print(array_5_rows_3_columns)
        
        print('\r\n')
        
        # create the DataFrame object based on the above 2D array.
        df = pd.DataFrame(array_5_rows_3_columns, columns=['python', 'java', 'javascript'])
        
        print('the DataFrame object created by the above 2 dimensional array : \r\n')
        print(df)
        
        print('\r\n')
        
        return df
    
    if __name__ == '__main__':
        
        create_example_dataframe_object()
    
    
  2. When you run the above example, you can get the below result, we can see that the example DataFrame object has 5 rows and 3 columns.
    the original array : 
    [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
    
    
    
    reshape the original array to 5 rows & 3 columns.
    
    
    [[ 0  1  2]
     [ 3  4  5]
     [ 6  7  8]
     [ 9 10 11]
     [12 13 14]]
    
    
    
    the DataFrame object created by the above 2 dimensional array : 
    
    
       python  java  javascript
    0       0     1           2
    1       3     4           5
    2       6     7           8
    3       9    10          11
    4      12    13          14
    

2. Operate The Entire DataFrame Data Sheet.

  1. Using the pipe() function, you can manipulate all elements in the DataFrme object by passing a custom function and the appropriate number of parameter values.
  2. The following example adds 2 to all elements in a pandas DataFrame object.
    import pandas as pd
    
    import numpy as np
    
    # this function will add the 2 input values and return the result.
    def add_func(val1,val2):
        return val1+val2
    
    def operate_entire_dataframe_object():
        
        df= create_example_dataframe_object()
        
        # call the DataFrame object's pipe() function to operate the entire DataFrame object with the add_func function.
        df1 = df.pipe(add_func, 2)
        
        print('\r\n')
        
        print('after invoke the pipe function to the DataFrame object:\r\n')
        
        print(df1)   
    
    if __name__ == '__main__':
        
        operate_entire_dataframe_object()
  3. When you run the above example source code, you will get the below output.
    the original array : 
    [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
    
    
    
    reshape the original array to 5 rows & 3 columns.
    
    
    [[ 0  1  2]
     [ 3  4  5]
     [ 6  7  8]
     [ 9 10 11]
     [12 13 14]]
    
    
    
    the DataFrame object created by the above 2 dimensional array : 
    
    
       python  java  javascript
    0       0     1           2
    1       3     4           5
    2       6     7           8
    3       9    10          11
    4      12    13          14
    
    
    
    
    
    after invoke the pipe function to the DataFrame object:
    
    
       python  java  javascript
    0       2     3           4
    1       5     6           7
    2       8     9          10
    3      11    12          13
    4      14    15          16
    

3. Operate The DataFrame Object Rows Or Columns.

  1. If you want to operate on a row or a column of a DataFrame object, you can use the apply() method.
  2. This method is similar to descriptive statistics methods in that they both have an optional parameter axis, and operate by column in default.
  3. Below is an example.
    import pandas as pd
    
    import numpy as np
    
    def operate_dataframe_rows_columns():
        
        df= create_example_dataframe_object()
        
        # apply the np.sum function to each column by default.
        df1 = df.apply(np.sum)
        
        print('\r\n')
        
        print('after apply the np.sum function to the DataFrame object columns : \r\n')
        
        print(df1)
        
        # if you want to apply the np.mean function to each row of the DataFrame object, you can pass axis=1 to the apply method.
        df2 = df.apply(np.mean, axis=1)
        
        print('\r\n')
        
        print('after apply the np.mean function to the DataFrame object rows : \r\n')
        
        print(df2)
        
        # get the difference between the maximum and minimum values in each column
        df3 = df.apply(lambda x: x.max() - x.min())
        
        print('\r\n')
        
        print('get the difference between the maximum and minimum values in each column: \r\n')
        
        print(df3)
        
        
        # get the difference between the maximum and minimum values in each row
        df4 = df.apply(lambda x: x.max() - x.min(), axis=1)
        
        print('\r\n')
        
        print('get the difference between the maximum and minimum values in each row: \r\n')
        
        print(df4)    
       
    
    if __name__ == '__main__':
        
        operate_dataframe_rows_columns()
  4. When you run the above example source code, you will get the below result.
    the original array : 
    [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
    
    
    
    reshape the original array to 5 rows & 3 columns.
    
    
    [[ 0  1  2]
     [ 3  4  5]
     [ 6  7  8]
     [ 9 10 11]
     [12 13 14]]
    
    
    
    the DataFrame object created by the above 2 dimensional array : 
    
    
       python  java  javascript
    0       0     1           2
    1       3     4           5
    2       6     7           8
    3       9    10          11
    4      12    13          14
    
    
    
    
    
    
    after apply the np.sum function to the DataFrame object columns : 
    
    
    python        30
    java          35
    javascript    40
    dtype: int64
    
    
    
    after apply the np.mean function to the DataFrame object rows : 
    
    
    0     1.0
    1     4.0
    2     7.0
    3    10.0
    4    13.0
    dtype: float64
    
    
    
    get the difference between the maximum and minimum values in each column : 
    
    
    python        12
    java          12
    javascript    12
    dtype: int64
    
    
    
    get the difference between the maximum and minimum values in each row : 
    
    
    0    2
    1    2
    2    2
    3    2
    4    2
    dtype: int64
    

4. Operate The DataFrame Object Single Element.

  1. The DataFrame object’s applymap() method is similar to the Series object’s map() method. Both of them can accept a python function and return the corresponding value.
    import pandas as pd
    
    import numpy as np
    
    def operate_dataframe_single_element():
        
        df= create_example_dataframe_object()
        
        col_series = df['python']
        
        print("df['python'] : \r\n")
        
        print(col_series)
        
        # replace all the Series object's elements value with x**3
        col_series_map= col_series.map(lambda x:x**3)
        
        print('\r\n')
        
        print('col_series.map(lambda x:x**3): \r\n')
        
        print(col_series_map)
        
        print('\r\n') 
        
        df1 = df.applymap(lambda x:x*10)
        
        print('df.applymap(lambda x:x*10): \r\n')
        
        print(df1)
        
           
    
    if __name__ == '__main__':
        
        operate_dataframe_single_element()
  2. When you run the above example, you will get the below output.
    the original array : 
    [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
    
    
    
    reshape the original array to 5 rows & 3 columns.
    
    
    [[ 0  1  2]
     [ 3  4  5]
     [ 6  7  8]
     [ 9 10 11]
     [12 13 14]]
    
    
    
    the DataFrame object created by the above 2 dimensional array : 
    
    
       python  java  javascript
    0       0     1           2
    1       3     4           5
    2       6     7           8
    3       9    10          11
    4      12    13          14
    
    
    
    df['python'] : 
    
    
    0     0
    1     3
    2     6
    3     9
    4    12
    Name: python, dtype: int32
    
    
    
    col_series.map(lambda x:x**3): 
    
    
    0       0
    1      27
    2     216
    3     729
    4    1728
    Name: python, dtype: int64
    
    
    
    df.applymap(lambda x:x*10): 
    
    
       python  java  javascript
    0       0    10          20
    1      30    40          50
    2      60    70          80
    3      90   100         110
    4     120   130         140
    

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.