Pandas Dataframe Examples: String Functions

Last updated:
Table of Contents

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

View all examples on this jupyter notebook

If you call .str on a Series object that contains string objects, you get to call string methods on all Series elements.

Select rows by partial string

Use .str.contains(). Set regex=False for better performance:

import pandas as pd

df = pd.DataFrame({
    'name': ['alice smith','bob jones','charlie joneson','daisy white'],
    'age': [25,20,30,35]
})

df[df['name'].str.contains('jones',regex=False)]

original-dataframe Original dataframe
         
filtered-dataset Rows whose name column contains jones anywhere

Select rows by regular expression

Just use .str.contains():

import pandas as pd

df = pd.DataFrame({
    'name': ['alice smith','bob jones','charlie joneson','daisy white'],
    'age': [25,20,30,35]
})

# names starting with 'b' or 'd'
df[df['name'].str.contains('^b|d')]

original-dataframe Original dataframe
         
filtered-dataset Rows whose name column starts with
either a 'b' or a 'd'

Concatenate two string columns

Just use the + sign:

import pandas as pd

df = pd.DataFrame({
    'first_name': ['alice','bob','charlie','daisy'],
    'last_name':['smith','jones','joneson','white'],
    'age': [25,20,30,35]
})

# just add the two columns
df['full_name'] = df['first_name'] + df['last_name']

original-dataframe Original dataframe with two separate columns
         
modified-dataframe AFTER: columns concatenated. See below for how to
add whitespace between the names

You can also add a simple string (whitespace) in between the columns; Pandas knows it should propagate that string to all rows:

df['full_name'] = df['first_name'] + ' ' + df['last_name']

names-with-spaces-in-between Alternative version: You can also add
simple whitespace in between the two
columns so that the names are correct.

Split string column into multiple columns

  • Create a function that takes a string and returns a series with the columns you want

  • Use apply() on the original dataframe

  • Concatenate the created columns onto the original dataframe

import pandas as pd

df = pd.DataFrame({
    'name': ['alice smith','bob jones','charlie joneson','daisy white'],
    'age': [25,20,30,35]
})

# a function that takes the value and returns
# a series with as many columns as you want
def split_name(name):
    first_name, last_name = name.split(' ')

    return pd.Series({
        'first_name': first_name,
        'last_name': last_name
    })

# df_new has the new columns
df_new = df['name'].apply(split_name)

# append the columns to the original dataframe
df_final = pd.concat([df,df_new],axis=1)

original-dataframe-with-full-names Original dataframe has a single column
with the full name
         
new-dataframe Two new columns were created
by splitting full_name into two

Dialogue & Discussion