Pandas Dataframe: Plot Examples with Matplotlib and Pyplot

Last updated:
Pandas Dataframe: Plot Examples with Matplotlib and Pyplot
Source
Table of Contents

All examples can be viewed in this sample Jupyter notebook

You need to have the matplotlib module installed for this!

Sample data for examples

import pandas as pd

df = pd.DataFrame({
    'name':['john','mary','peter','jeff','bill','lisa','jose'],
    'age':[23,78,22,19,45,33,20],
    'gender':['M','F','M','M','M','F','M'],
    'state':['california','dc','california','dc','california','texas','texas'],
    'num_children':[2,0,0,3,2,1,4],
    'num_pets':[5,1,0,5,2,2,3]
})

sample-pandas-dataframe This is what our sample dataset looks like

Pandas has tight integration with matplotlib.

You can plot data directly from your DataFrame using the plot() method:

Plot two dataframe columns as a scatter plot

import matplotlib.pyplot as plt
import pandas as pd

# a scatter plot comparing num_children and num_pets
df.plot(kind='scatter',x='num_children',y='num_pets',color='red')
plt.show()

source-dataframe Source dataframe
simple scatter plot based on pandas dataframe Looks like we have a trend

Plot column values as a bar plot

import matplotlib.pyplot as plt
import pandas as pd

# a simple line plot
df.plot(kind='bar',x='name',y='age')

source-dataframe Source dataframe
simple bar plot based on pandas dataframe 'kind' takes arguments such as 'bar', 'barh' (horizontal bars), etc

Line plot with multiple columns

Just reuse the Axes object.

import matplotlib.pyplot as plt
import pandas as pd

# gca stands for 'get current axis'
ax = plt.gca()

df.plot(kind='line',x='name',y='num_children',ax=ax)
df.plot(kind='line',x='name',y='num_pets', color='red', ax=ax)

plt.show()

source-dataframe Source dataframe
optional-argument-ax plot() takes an optional argument 'ax' which allows you to
reuse an Axis to plot multiple lines

Save plot to file

Instead of calling plt.show(), call plt.savefig('outputfile.png'):

import matplotlib.pyplot as plt
import pandas as pd

df.plot(kind='bar',x='name',y='age')

# the plot gets saved to 'output.png'
plt.savefig('output.png')

Stacked bar plot with group by

import matplotlib.pyplot as plt
import pandas as pd

df.groupby('state')['name'].nunique().plot(kind='bar')
plt.show()

source-dataframe Source dataframe
number-unique-names-per-state Number of unique names per state

Twitter Linkedin YC Hacker News Reddit

Stacked bar plot with two-level group by

Just do a normal groupby() and call unstack():

import matplotlib.pyplot as plt
import pandas as pd

df.groupby(['state','gender'])['name'].size().unstack().plot(kind='bar',stacked=True)
plt.show()

source-dataframe Source dataframe
number-unique-names-per-state Stacked bar chart showing the number of people
per state, split into males and females

Another example: count the people by gender, spliting by state:

import matplotlib.pyplot as plt
import pandas as pd

df.groupby(['gender','state'])['age'].size().unstack().plot(kind='bar',stacked=True)
plt.show()

source-dataframe Source dataframe
number-of-people-by-gender Now grouped by 'state' and 'gender'

Stacked bar plot with percentage view, normalized to 100%

Sometimes you are only ever interested in the distributions, not raw amounts:

import matplotlib.ticker as mtick

df.groupby(['gender','state'])['age'].size().groupby(level=0).apply(
    lambda x: 100 * x / x.sum()
).unstack().plot(kind='bar',stacked=True)

plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter())
plt.show()

source-dataframe Source dataframe
number-of-people-by-gender-normalized People grouped by state and gender, with normalized columns
so that each sums up to 100%

Plot a histogram of column values

import matplotlib.pyplot as plt
import pandas as pd

df[['age']].plot(kind='hist',bins=[0,20,40,60,80,100],rwidth=0.8)
plt.show()

source-dataframe Source dataframe
age-by-bins The most common age group is between 20 and 40 years old

Date histogram

To plot a date histogram, you must first convert the date column to datetime using pandas.to_datetime().

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'name':['john','lisa','peter','carl','linda','betty'],
    'date_of_birth':[
        '01/21/1988','03/10/1977','07/25/1999','01/22/1977','09/30/1968','09/15/1970'
    ]
})

original-dataframe Dates were added as strings in American format

Now convert the date column into datetime type and use plot(kind='hist'):

df['date_of_birth'] = pd.to_datetime(df['date_of_birth'],infer_datetime_format=True)

plt.clf()
df['date_of_birth'].map(lambda d: d.month).plot(kind='hist')
plt.show()

original-dataframe The column is now of type datetime64[ns]
(Even though they still look like strings)
         
plot Each object is a regular Python datetime.Timestamp object.
Map each one to its month and plot


References

  • A lot of other types of plot are available. See all of them here

Dialogue & Discussion