Python Pickling for Data Science: Examples and Tips on how to use Pickle as part of your Data Work

Last updated:

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

Simplest possible example: dump a Python dictionary to a Pickle file

Use a .p extension to follow convention

import pickle

colors = { "john": "yellow", "mary": "red" }

pickle.dump(colors, open("colors.p", "wb"))

Simplest possible example: read a Python dictionary from a Pickle file

Pickles can be hacked so only read files you can trust

import pickle

# read back the file written by the method above
colors = pickle.read(open("colors.p","rb"))

Pickling a trained Scikit-learn model

Just apply the method above after you have called fit() (or fit_transform() or partial_fit()) on your model.

Watch out for the encoding

Pickle is a binary data format so be sure to read and write the files using the binary flags ("rb" and "wb", respectively).

Things that can't be pickled

  • Lambdas

  • Code in classes

Pickling objects/classes

TODO

Resources

Dialogue & Discussion