HyperAI

Pre-Puning

Pre-pruningIt is a type of pruning algorithm, which mainly refers to the pruning operation performed before the decision tree is generated. The corresponding one is post-pruning, which aims to perform pruning operations after the decision tree is generated.

During the growth of the decision tree, a preliminary estimate is made before each node branch. If the division of the node cannot improve the generalization performance of the decision tree, then the division is stopped and the node is marked as a leaf node.

Common strategies for pre-pruning

  • Define a height, and when the decision tree reaches this height, it stops growing;
  • When the decision tree reaches a node, these instances have the same feature vector, even if they do not belong to the same class, the growth of the decision tree can be stopped. This method is more effective in dealing with data conflicts.
  • Define a threshold. When the decision tree reaches a certain node and the number of instances is less than the threshold, it can stop growing.
  • Define a threshold, calculate the gain of each expansion on system performance, and compare the gain value with the threshold to decide whether to stop growing.

Advantages and disadvantages of pre-pruning

  • Advantages: Avoid unnecessary node expansion, and reduce training time and testing time to a certain extent
  • Disadvantages: There is a risk of underfitting
Parent word: pruning
Synonym: post-pruning