A Quick Summary of Ensemble Learning Strategies

A Quick Summary of Ensemble Learning Strategies

Last updated: 01 Sep 2017

Table of Contents

Simple Voting
Weighted Voting
Bagging
Stacking
Boosting

Ensemble learning refers to mixing the outputs of several underlying classifiers in various ways, in order to:

Get more accurate predictions that each model individually.
Help in generalization, thus reducing the risk of overfitting.

Most Kaggle competitions are won by ensemble methods (as of 2018).

The main types of ensemble techniques are:

Simple Voting

Train each underlying model on the whole training data
- The output of the ensemble model is the output of the majority of underlying methods.
- All underlying models have the same voting power.
The equivalent technique for regression is Model Averaging

Weighted Voting

Train each underlying model on the whole training data.
- The output of the ensemble model is the output of the majority of underlying methods (as above)
- But some models have more voting power than others.

Bagging

Split the training data into random subsets (sampling with replacement)
- The output of the ensemble model is the average of the underlying methods.
- All underlying models have the same voting power.
Example: Random Forest is Bagging applied to Decision Trees.
The objective is to increase generalization power.

Stacking

Train each underlying model on the whole training data.
Train another model (e.g. Logistic Regression) to learn how to best combine the outputs of each underlying model.
The objective is to increase model accuracy and generalization power.

Boosting

Train a model on the whole training data.
- Then train a model on the errors (residuals) of the previous model.
- Repeat until convergence.
The objective is to increase the accuracy.
Examples: AdaBoost and XGBoost are variants of boosting.

This short post is part of the data newsletter. Click here to sign up.

References

Toptal: Ensemble Methods in Machine Learning
Learn by marketing: Kaggle Competition Analysis
- Ensembles and XGBoost (a specific type of ensemble) win by a large margin.
MLWave: Kaggle Ensembling Guide
- Large document, with tons of examples and Kaggle competitions to try the methods in.
Cross Validated Answer to: Bagging Boosting and Stacking in Machine Learning
- Good overview of pros/cons of bagging and boosting.

Felipe 01 Sep 2017 01 Sep 2017 data-newsletter-4 machine-learning

Dialogue & Discussion

