Pandas DataFrame: GroupBy Examples

Last updated:
Table of Contents

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

For Dataframe usage examples not related to GroupBy, see Pandas Dataframe by Example

View all examples in this post on this notebook: pandas-groupby-post

Concatenate strings in a group

This is called GROUP_CONCAT in databases such as MySQL

In the original dataframe, each row is a tag assignment.

import pandas as pd

df = pd.DataFrame({
    'user_id':[1,2,1,3,3,],
    'content_id':[1,1,2,2,2],
    'tag':['cool','nice','clever','clever','not-bad']
})

df.groupby("content_id")['tag'].apply(lambda tags: ','.join(tags))

After the operation, we have one row per content_id and all tags are joined with ','.

source-dataframe Source dataframe
tags-by-content All tags given to each content

Number of of unique column values per group

How many unique users have tagged each movie?

import pandas as pd

df = pd.DataFrame({
    'user_id':[1,2,1,3,3,],
    'content_id':[1,1,2,2,2],
    'tag':['cool','nice','clever','clever','not-bad']
})

df.groupby("container_id")["user_id"].nunique().to_frame()

source-dataframe Source dataframe
tags-by-content How many users tagged each content?


References

Dialogue & Discussion