Using SQL-like Syntax with Pandas Dataframe: Query and Eval Examples

Last updated:
Table of Contents

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

Why use query/eval?

Because it enables you to create views and filters inplace.

# needs two steps
# one to assign the dataframe to a variable
df = pd.DataFrame({
    'name':['john','david','anna'],
    'country':['USA','UK',np.nan]
})

# another one to perform the filter
df[df['country']=='USA']

But you can define the dataframe and query on it in a single step (memory gets freed at once because you didn't create any temporary variables)

# this is equivalent to the code above
# and uses no intermediate variables
pd.DataFrame({
    'name':['john','david','anna'],
    'country':['USA','UK',np.nan]
}).query("country == 'USA'")

Query using python variables

Use @variable_name:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name':['john','david','anna'],
    'country':['USA','UK',np.nan],
    'age':[23,45,45]
})

target_age = 45

df.query('age == @target_age')

Query where is null

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name':['john','david','anna'],
    'country':['USA','UK',np.nan]
})

df.query('country.isnull()')

original-dataframe Original Dataframe
         
query-where-is-null Equivalent to:
"where country is NULL"

Query where is NOT null

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name':['john','david','anna'],
    'country':['USA','UK',np.nan]
})

df.query('country.notnull()')

original-dataframe Original Dataframe
         
query-where-is-null Equivalent to:
"where country is NOT NULL"

Twitter Linkedin YC Hacker News Reddit

References

Dialogue & Discussion