Blogs · Ensemble · Boosting · Supervised Learning

Ensemble Models: Overview

Using the power of public

2021.01.18 · 6 min read · by Zhenlin Wang · updated 2021-09-28

Overview

An important techinque in machine learning is ensemble models. It includes some very popular techniques like bootstraping and boosting. In the upcoming blogs, I will outline these models in detail, and give comparisons when necessary. The mathematical proofs are omitted for simplicity. However, I highly recommend interested readers to take a look at the theoretical foundations of these models to gain great intuitions about the ideas behind ensemble models.

Intro

Ensemble Machine Learning

1. Bagging

2. Boosting

3. Stacking

4. How are base-learners classified

Bagging vs Boosting

1. Selecting the best technique- Bagging or Boosting

2. Similarities between Bagging and Boosting

  1. Both are ensemble methods to get N learners from 1 learner.
  2. Both generate several training data sets by random sampling.
  3. Both make the final decision by averaging the N learners (or taking the majority of them i.e Majority Voting).
  4. Both are good at reducing variance and provide higher stability.

3. Differences between Bagging and Boosting

  1. Bagging is the simplest way of combining predictions that belong to the same type while Boosting is a way of combining predictions that belong to the different types.
  2. Bagging aims to decrease variance, not bias while Boosting aims to decrease bias, not variance.
  3. In Baggiing each model receives equal weight whereas in Boosting models are weighted according to their performance.
  4. In Bagging each model is built independently whereas in Boosting new models are influenced by performance of previously built models.
  5. In Bagging different training data subsets are randomly drawn with replacement from the entire training dataset. In Boosting every new subsets contains the elements that were misclassified by previous models.
  6. Bagging tries to solve over-fitting problem while Boosting tries to reduce bias.
  7. If the classifier is unstable (high variance), then we should apply Bagging. If the classifier is stable and simple (high bias) then we should apply Boosting.
  8. Bagging is extended to Random forest model while Boosting is extended to Gradient boosting.