- Problems to solve
- Main Design decisions
Problems to solve
Problems the platform aims at solving:
Add support for common ML frameworks such as Scikit-learn, etc
Cater to different workfows for Online vs Batch ML
Decrease development time + Time-to-market
Reduce incidental complexity
Share features across teams
Main Design decisions
Everything on Docker
Equivalence between Online vs Offline Models
How does this relate to Aerosolve? Not sure. Aerosolve looks dead.
It's a collection of data processing steps that you can use to define all steps in a modelling pipeline.
For example, you can use BigHead libraries to define preprocessing steps, what features you'll use in your pipeline, etc.
This returns a regular scikit learn
Pipeline object you can call
transform(), etc on.
Looks like a Scikit-learn Pipeline on steroids with more features to help you analyze the features used, visualize scores, inspect the components, etc.
Model management component: used for keeping track of what model version is in use at the moment, keep a history of used versions, etc.
Data management/feature management component.
Automatically builds Flink and Spark jobs for data preprocessing
Jupyter notebooks as a service, used for prototyping and analysis.
Based upon Jupyter Hub.
You can share enviroments so that people can work together, share notebooks to persistent storage, etc
Dedicated AWS instances, attention to cost-savings
Environments are all Dockerized
Deployment layer for offline (batch) models
Deployment layer for online models.
It takes as input a serialized BigHead pipeline, wraps it in a Java REST service and builds a Docker image with it.
In addtion, it adds support functionality such as logging, visualization, monitoring, etc.