Quick Summary + Thoughts on BigHead: AirBNB's ML Platform

Last updated: 28 Aug 2021

Table of Contents

Problems to solve
Main Design decisions
BigHead

Problems to solve

Problems the platform aims at solving:

Add support for common ML frameworks such as Scikit-learn, etc
Cater to different workfows for Online vs Batch ML
Decrease development time + Time-to-market
Reduce incidental complexity
Share features across teams

Main Design decisions

Design decisions:

Everything on Docker
Equivalence between Online vs Offline Models

BigHead

BigHead Libraries

How does this relate to Aerosolve? Not sure. Aerosolve looks dead.

It's a collection of data processing steps that you can use to define all steps in a modelling pipeline.

For example, you can use BigHead libraries to define preprocessing steps, what features you'll use in your pipeline, etc.

This returns a regular scikit learn Pipeline object you can call fit(), transform(), etc on.

Looks like a Scikit-learn Pipeline on steroids with more features to help you analyze the features used, visualize scores, inspect the components, etc.

BigHead Service

Model management component: used for keeping track of what model version is in use at the moment, keep a history of used versions, etc.

Zipline

Data management/feature management component.

Automatically builds Flink and Spark jobs for data preprocessing

RedSpot

Jupyter notebooks as a service, used for prototyping and analysis.

Features:

Based upon Jupyter Hub.
You can share enviroments so that people can work together, share notebooks to persistent storage, etc
Dedicated AWS instances, attention to cost-savings
Environments are all Dockerized

BigQueue

Training environment

ML Automator

Deployment layer for offline (batch) models

Deep Thought

Deployment layer for online models.

It takes as input a serialized BigHead pipeline, wraps it in a Java REST service and builds a Docker image with it.

In addtion, it adds support functionality such as logging, visualization, monitoring, etc.

References

Felipe 03 Dec 2018 28 Aug 2021 ml-platforms