Quick Summary + Thoughts on BigHead: AirBNB's ML Platform

Quick Summary + Thoughts on BigHead: AirBNB's ML Platform

Last updated:
Table of Contents

Problems to solve

Problems the platform aims at solving:

  • Add support for common ML frameworks such as Scikit-learn, etc

  • Cater to different workfows for Online vs Batch ML

  • Decrease development time + Time-to-market

  • Reduce incidental complexity

  • Share features across teams

Main Design decisions

Design decisions:

  • Everything on Docker

  • Equivalence between Online vs Offline Models

BigHead

BigHead Libraries

How does this relate to Aerosolve? Not sure. Aerosolve looks dead.

It's a collection of data processing steps that you can use to define all steps in a modelling pipeline.

For example, you can use BigHead libraries to define preprocessing steps, what features you'll use in your pipeline, etc.

This returns a regular scikit learn Pipeline object you can call fit(), transform(), etc on.

Looks like a Scikit-learn Pipeline on steroids with more features to help you analyze the features used, visualize scores, inspect the components, etc.

BigHead Service

Model management component: used for keeping track of what model version is in use at the moment, keep a history of used versions, etc.

Zipline

Data management/feature management component.

Automatically builds Flink and Spark jobs for data preprocessing

RedSpot

Jupyter notebooks as a service, used for prototyping and analysis.

Features:

  • Based upon Jupyter Hub.

  • You can share enviroments so that people can work together, share notebooks to persistent storage, etc

  • Dedicated AWS instances, attention to cost-savings

  • Environments are all Dockerized

BigQueue

Training environment

ML Automator

Deployment layer for offline (batch) models

Deep Thought

Deployment layer for online models.

It takes as input a serialized BigHead pipeline, wraps it in a Java REST service and builds a Docker image with it.

In addtion, it adds support functionality such as logging, visualization, monitoring, etc.


References