HyperAI

Boosting

In machine learning,Boosting is an integrated meta-algorithm mainly used to reduce bias and variance in supervised learning, as well as a series of machine learning algorithms that convert weak learners into strong learners. Boosting is based on the question posed by Kearns and Valiant (1988, 1989): "Can a set of weak learners create a strong learner?" A weak learner is defined as a classifier that is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well correlated with the true classification. In boosting, a random sample of data is selected, fitted to a model, and then trained sequentially — that is, each model tries to compensate for the weaknesses of its predecessor. In each iteration, the weak rules from each individual classifier are combined into a strong prediction rule. 

Three methods of boosting

Various boosting algorithms may differ in the way they create and aggregate weak learners in a sequential process. Three popular boosting methods include: 

  • Adaptive Boosting, also known as AdaBoost:The AdaBoost algorithm was created by Yoav Freund and Robert Schapire. This method works in an iterative manner, discovering misclassified data points and adjusting their weights to minimize the training error. The model continues to be optimized in a sequential manner until the strongest predictor is produced.  
  • Gradient Boosting:Building on the work of Leo Breiman, Jerome H. Friedman developed gradient boosting, which works by sequentially adding predictors to an ensemble, with each predictor correcting the error of its predecessor. However, instead of changing the weights of data points like AdaBoost, gradient boosting trains on the residuals of the previous predictor. The name gradient boosting is used because it combines the gradient descent algorithm with the boosting method.  
  • Extreme Gradient Boosting, or XGBoost: XGBoost is an implementation of gradient boosting designed for computational speed and scale. XGBoost leverages multiple cores of the CPU to enable learning to be performed in parallel during training. 

Boosting Advantages

  • Easy to implement:Boosting can be used with several hyperparameter tuning options to improve the fit. Boosting algorithms have built-in routines to handle missing data, so no data preprocessing is required. In Python, the scikit-learn library for ensemble methods (also known as sklearn.ensemble) makes it easy to implement popular boosting methods, including AdaBoost, XGBoost, and others.  
  • Reduce bias:The boosting algorithm combines multiple weak learners in sequence to iteratively improve the observations. This approach helps reduce the high bias commonly seen in shallow decision tree and logistic regression models. 
  • Computational efficiency:Since the boosting algorithm only selects features that improve its predictive power during training, it helps in reducing dimensionality and improving computational efficiency.  

Challenges of Boosting  

  • Overfitting:There is some controversy in various studies (link outside ibm.com) as to whether boosting helps reduce overfitting or worsens it. This was included in the challenge because it does happen that predictions fail to generalize to new datasets.  
  • Intensive calculation:  Sequential training in boosting is difficult to scale. Since each imputed variable builds on its predecessor, boosting models are computationally expensive, although XGBoost attempts to address scalability issues that occur in other types of boosting. Boosting algorithms can be slower to train than bagging because of the large number of parameters that also affect the behavior of the model. 

References

【1】https://www.ibm.com/cn-zh/topics/boosting