Quick Summary + Thoughts on BigHead: AirBNB's ML Platform
Last updated:Problems to solve
Problems the platform aims at solving:
Add support for common ML frameworks such as Scikit-learn, etc
Cater to different workfows for Online vs Batch ML
Decrease development time + Time-to-market
Reduce incidental complexity
Share features across teams
Main Design decisions
Design decisions:
Everything on Docker
Equivalence between Online vs Offline Models
BigHead
BigHead Libraries
How does this relate to Aerosolve? Not sure. Aerosolve looks dead.
It's a collection of data processing steps that you can use to define all steps in a modelling pipeline.
For example, you can use BigHead libraries to define preprocessing steps, what features you'll use in your pipeline, etc.
This returns a regular scikit learn Pipeline
object you can call fit()
, transform()
, etc on.
Looks like a Scikit-learn Pipeline on steroids with more features to help you analyze the features used, visualize scores, inspect the components, etc.
BigHead Service
Model management component: used for keeping track of what model version is in use at the moment, keep a history of used versions, etc.
Zipline
Data management/feature management component.
Automatically builds Flink and Spark jobs for data preprocessing
RedSpot
Jupyter notebooks as a service, used for prototyping and analysis.
Features:
Based upon Jupyter Hub.
You can share enviroments so that people can work together, share notebooks to persistent storage, etc
Dedicated AWS instances, attention to cost-savings
Environments are all Dockerized
BigQueue
Training environment
ML Automator
Deployment layer for offline (batch) models
Deep Thought
Deployment layer for online models.
It takes as input a serialized BigHead pipeline, wraps it in a Java REST service and builds a Docker image with it.
In addtion, it adds support functionality such as logging, visualization, monitoring, etc.