Pandas Examples: Plotting Date/Time data with Matplotlib/Pyplot
Last updated:- Plot date histogram per month
- Fill missing days in between
- Stacked barplot count per date, percentages
- Plot by start of week
You need to have the matplotlib module installed for this!
Versions used: Pandas 1.3.x, matplotlib 3.4.x
See many more examples on plotting data directly from dataframes here: Pandas Dataframe: Plot Examples with Matplotlib and Pyplot
Examples on how to plot date/time data from within pandas dataframes, using various methods.
All code can be found on this jupyter notebook online
Plot date histogram per month
import pandas as pd
# source dataframe
df = pd.DataFrame({
'name':[
'alice', 'bob', 'charlie', 'david', 'edward', 'felicia'
],
'date_of_admission':[
'2021-10-28', '2022-01-24', '2022-03-25', '2022-04-11', '2022-03-01', '2022-03-30'
],
})
Now convert the date column into datetime type, use groupby()
and plot()
:
# convert column to datetime
df['date_of_admission'] = pd.to_datetime(df['date_of_admission'])
# create a representation of the month with strfmt
df['year_month_of_admission'] = df['date_of_admission'].map(lambda dt: dt.strftime('%Y-%m'))
grouped_df = df.groupby('year_month_of_admission').size().to_frame("count").reset_index()
grouped_df.plot(kind='bar', x='year_month_of_admission', y='count')
Fill missing days in between
For Offset Alias strings other than
'D'
, see this: Offset Aliases
We can add "fake" rows by using a DateTimeIndex
and the asfreq
methods.
This is useful when you have to plot aggregate data (counts, sums, etc per date) and you want to make sure that missing data is conspicuous, so that everyone can see if/when there's missing data.
import pandas as pd
# source data before aggregation
df = pd.DataFrame({
'item_id':[
'273','231', '133', '555', '123','234','342'
],
'purchase_date':[
'1988-12-28', '1988-12-24', '1988-12-25', '1988-12-25', '1988-12-26', '1988-12-28' , '1988-12-30',
],
})
# group by to find out how many purchases we had per day
grouped_df = df.groupby('purchase_date').size().to_frame("count").reset_index()
# here we set a new index using the datetime column
datetime_series = pd.to_datetime(grouped_df['purchase_date'])
datetime_index = pd.DatetimeIndex(datetime_series.values)
grouped_df.set_index(datetime_index, inplace=True)
# IMPORTANT! we can only add rows for missing periods
# if the dataframe is SORTED by the index
grouped_df.sort_index(inplace=True)
# we change the FREQUENCY of the dataframe using asfreq
grouped_df_filled_missing = grouped_df.asfreq('D')
# just plot as you normally would
grouped_df_filled_missing.plot(kind='bar', y='count')
Note that there is no data for 27th and 29th
Now for the plot results:
DateTimeIndex
we were able to fill the holes so to speak.
This makes it much clearer to viewers that
there were days with NO purchases
Stacked barplot count per date, percentages
Plot the number of visits a website had, per day and using another column (in this case browser
) as drill down.
Just use df.groupby()
, passing the DatetimeIndex
and an optional drill down column.
from datetime import date,datetime,timedelta
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
import pandas as pd
np.random.seed(42)
# create a dummy dataset
df = pd.DataFrame(
index=pd.to_datetime(
[datetime(2019,1,1)+timedelta(hours=h+10*np.random.uniform()) for h in range(0,1000)]
),
data={
'browser': np.random.choice([
'chrome',
'firefox',
'safari',
'edge'
],size=1000,replace=True)
},
)
print(type(df.index))
# 'pandas.core.indexes.datetimes.DatetimeIndex'
# just to make it more intuitive
df.index.name= 'login_datetime'
# you can pass a lambda function to the groupby function
# so that it groups by the day (or anything else you want)
df.groupby(by=[lambda dt: dt.strftime('%Y-%m-%d'),'browser']).size().groupby(level=0).apply(
lambda x: 100 * x / x.sum()
).unstack().plot(kind='bar',stacked=True,legend='reverse',width=0.8)
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter())
plt.show()
represents an individual visit to
a given website, and the browser
used.
website, per day and browser
Plot by start of week
TODO get start of the week, group by