HyperAI

Out-of-Bag Estimate

Outsourcing EstimatesIt is a method of making decisions using test data that did not appear in the training set.

Definition of Outsourced Estimation

The bagging process of random forest, for each trained decision tree g t , has the following relationship with data set D:

The asterisked part is the data that is not selected, which is called Out-of-bag (OOB) data. When there are enough data, the probability that any set of data (xn, yn) is out-of-bag data is:

Since the base classifier is built on the bootstrapped sampling set of training samples, only about 63.2% of the original sample set appears in , while the remaining 36.8% of the data is used as out-of-bag data and can be used as a validation set for the base classifier.

It has been proven that the out-of-bag estimate is an unbiased estimate of the generalization error of the ensemble classifier. The importance of dataset attributes, classifier set strength, and inter-classifier correlation calculations in the random forest algorithm all rely on out-of-bag data.

Uses of Out-of-Pack Estimates

  • When the base learner is a decision tree, out-of-bag samples can be used to assist pruning, or to estimate the posterior probability of each node in the decision tree to assist in processing nodes with zero training samples;
  • When the base learner is a neural network, out-of-bag samples can be used to assist early stopping to reduce overfitting.
Parent word: Random Forest Algorithm