Mutate for Pandas Dataframes: Examples with Assign

Last updated:
Table of Contents

mutate is a very popular function is R's dplyr package.

Since many people are familiar with R and would like to have similar behaviour in pandas (it's also useful for those who've never used R).

Pandas assign() function is the equivalent of mutate for pandas.

Why use assign?

You can use it to avoid needing to define tons of intermediate dataframes in your code, especially if you're using Jupyter Notebooks or similar exploratory tools.

Simplest possible example

Template: `df.assign( = )

import pandas as pd

df = pd.DataFrame({
    'name': ['alice','bob','charlie','daniel'],
    'age': [25,66,56,78]
})

df.assign(
    is_senior = lambda dataframe: dataframe['age'].map(lambda age: True if age >= 65 else False) 
)

original-dataframe BEFORE: the original dataframe
         
output-dataframe AFTER: added a derived column
using the assign method

Chain application

import pandas as pd

df = pd.DataFrame({
    'name': ['alice','bob','charlie','daniel'],
    'age': [25,66,56,78]
})

df.assign(
    is_senior = lambda dataframe: dataframe['age'].map(lambda age: True if age >= 65 else False) 
).assign(
    name_uppercase = lambda dataframe: dataframe['name'].map(lambda name: name.upper()),
).assign(
    name_uppercase_double = lambda dataframe: dataframe['name_uppercase'].map(lambda name: name.upper()+"-"+name.upper())
)

original-dataframe BEFORE: original dataframe
         
modified-dataframe AFTER: Using .assign you can make multiple
operations that depend on the
previous ones without the need
of creating intermediate variables

Dialogue & Discussion