Pandas Time Series Examples: DatetimeIndex, PeriodIndex and TimedeltaIndex

Last updated:
Table of Contents

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

View all code in this jupyter notebook

For more examples on how to manipulate date and time values in pandas dataframes, see Pandas Dataframe Examples: Manipulating Date and Time

Use existing date column as index

If your dataframe already has a date column, you can use use it as an index, of type DatetimeIndex:

import pandas as pd

# this is the original dataframe
df = pd.DataFrame({
    'name':[
        'john','mary','peter','jeff','bill'
    ],
    'date_of_birth':[
        '2000-01-01', '1999-12-20', '2000-11-01', '1995-02-25', '1992-06-30',
    ],
})

print(df.index)
# RangeIndex(start=0, stop=5, step=1)

# convert the column (it's a string) to datetime type
datetime_series = pd.to_datetime(df['date_of_birth'])

# create datetime index passing the datetime series
datetime_index = pd.DatetimeIndex(datetime_series.values)

df2=df.set_index(datetime_index)

# we don't need the column anymore
df2.drop('date_of_birth',axis=1,inplace=True)

print(df2.index)
# DatetimeIndex(['2000-01-01', '1999-12-20', '2000-11-01', '1995-02-25',
#    '1992-06-30'], dtype='datetime64[ns]', freq=None)

original-dataframe-has-a-range-index BEFORE: If you don't specify an
index when creating a dataframe,
by default it's a RangeIndex
         
dataframe-now-has-datetimeindex AFTER: After setting the index to
the date column, the index is now
of type DatetimeIndex

Add row for empty periods

View all offset aliases here

import pandas as pd

df = pd.DataFrame({
    'name':[
        'john','mary','peter','jeff','bill'
    ],
    'year_born':[
        '2000', '1999', '2000', '1995', '1992',
    ],
})

df.index
# RangeIndex(start=0, stop=5, step=1)

# build a datetime index from the date column
datetime_series = pd.to_datetime(df['year_born'])
datetime_index = pd.DatetimeIndex(datetime_series.values)

# replace the original index with the new one
df3=df.set_index(datetime_index)

# we don't need the column anymore
df3.drop('year_born',axis=1,inplace=True)

# IMPORTANT! we can only add rows for missing periods
# if the dataframe is SORTED by the index
df3.sort_index(inplace=True)

df3.index
# DatetimeIndex(['1992-01-01', '1995-01-01', '1999-01-01', '2000-01-01',
#               '2001-01-01'],
#              dtype='datetime64[ns]', freq=None)

# 'YS' stands for 'YEAR START'
df4=df3.asfreq('YS')

df4.index
# DatetimeIndex(['1992-01-01', '1993-01-01', '1994-01-01', '1995-01-01',
#               '1996-01-01', '1997-01-01', '1998-01-01', '1999-01-01',
#               '2000-01-01', '2001-01-01'],
#              dtype='datetime64[ns]', freq='AS-JAN')

original-dataframe df: original dataframe
     
after-setting-datatime-index df3: after transforming the
date column into a DatetimeIndex.
Note that the years have
been converted to day-
based dates.
     
after-changing-frequency-and-filling-empty-rows df4: after calling asfreq(), extra
rows (in blue) have been
added for the missing periods.

Create lag columns using shift

template: .shift(<number_of_periods>, <offset_alias>) where the alias is one of 'D' for days, 'W' for weeks, etc.

In many cases you want to use values for previous dates as features in order to train classifiers, analyze data, etc.

from datetime import date

import numpy as np
import pandas as pd

# create a dummy dataset
df = pd.DataFrame(
    data={'reading': np.random.uniform(high=100,size=10)},
    index=pd.to_datetime([date(2019,1,d) for d in range(1,11)])
)

# D minus 1 = current day minus 1
df['reading_d_minus_1']=df['reading'].shift(1,freq='D')

# create columns for 2 days before as well
df['reading_d_minus_2']=df['reading'].shift(2,freq='D')

original-dataframe This dataframe has one reading of each day from
Jan 1, 2019 to Jan 10, 2019
         
dataframe-with-lagged-columns Two lagged columns were added to the right
of the original column. (See matching values in blue)
Note that there are NaNs (red) when
there is no previous data available.

Plot distribution per unit time

See many more examples on plotting data directly from dataframes here: Pandas Dataframe: Plot Examples with Matplotlib and Pyplot

Plot the number of visits a website had, per day and using another column (in this case browser) as drill down.

Just use df.groupby(), passing the DatetimeIndex and an optional drill down column.

from datetime import date,datetime,timedelta

import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
import pandas as pd

np.random.seed(42)

# create a dummy dataset
df = pd.DataFrame(
    index=pd.to_datetime(
        [datetime(2019,1,1)+timedelta(hours=h+10*np.random.uniform()) for h in range(0,1000)]
    ),
    data={
        'browser': np.random.choice([
            'chrome',
            'firefox',
            'safari',
            'edge'
        ],size=1000,replace=True)
    },

)

print(type(df.index))
# 'pandas.core.indexes.datetimes.DatetimeIndex'

# just to make it more intuitive
df.index.name= 'login_datetime'

# you can pass a lambda function to the groupby function
# so that it groups by the day (or anything else you want)
df.groupby(by=[lambda dt: dt.strftime('%Y-%m-%d'),'browser']).size().groupby(level=0).apply(
    lambda x: 100 * x / x.sum()
).unstack().plot(kind='bar',stacked=True,legend='reverse',width=0.8)

plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter())
plt.show()

sample-dataframe Sample dataframe: each line
represents an individual visit to
a given website, and the browser
used.
         
size-by-day-plot Agreggated number of visits to
website, per day and browser

Dialogue & Discussion