Using SQL-like Syntax with Pandas Dataframe: Query and Eval Examples

Last updated:
Table of Contents

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

Why use query/eval?

Because it enables you to create views and filters inplace.

# needs two steps
# one to assign the dataframe to a variable
df = pd.DataFrame({
    'name':['john','david','anna'],
    'country':['USA','UK',np.nan]
})

# another one to perform the filter
df[df['country']=='USA']

But you can define the dataframe and query on it in a single step (memory gets freed at once because you didn't create any temporary variables)

# this is equivalent to the code above
# and uses no intermediate variables
pd.DataFrame({
    'name':['john','david','anna'],
    'country':['USA','UK',np.nan]
}).query("country == 'USA'")

Query where is null

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name':['john','david','anna'],
    'country':['USA','UK',np.nan]
})

df.query('country.isnull()')

original-dataframe Original Dataframe
         
query-where-is-null Equivalent to:
"where country is NULL"

Query where is NOT null

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name':['john','david','anna'],
    'country':['USA','UK',np.nan]
})

df.query('country.notnull()')

original-dataframe Original Dataframe
         
query-where-is-null Equivalent to:
"where country is NOT NULL"


References

Dialogue & Discussion