HyperAI

Post-Prunning

Post-pruningRefers to the pruning operation performed after the decision tree is generated. This method is based on the complete decision tree and allows the tree to overfit the training data. For node words with insufficient confidence, the subtree will be replaced by a leaf node, and the class label of the leaf is marked with the most frequent class in the node subtree.

The post-pruning process is to check a group of nodes with the same parent node to determine whether the increase in entropy is less than a certain threshold if they are merged. When the threshold is small, a group of nodes can be merged into one, which contains all possible results.

Post-pruning method

Split the test data based on the existing tree:

  • If any subset is a tree, recursively perform the pruning process on that subset;
  • Calculate the uncombined errors;
  • If merging will reduce the error, then merge the leaf nodes.

List of post-pruning algorithms

1) Reduced-Error Pruning (REP);

2) Pessimistic pruning cost EBP (Error-Based Pruning);

3) Cost-Complexity Pruning (CCP);

4) Error-based pruning PEP (Pesimistic-Error Pruning) .

Comparison between pre-pruning and post-pruning

The setting of the front threshold is very sensitive, and a small change will cause changes in the entire tree. In comparison, the post-pruning scheme will produce better results.

Post-pruning retains more branches and has a lower risk of underfitting than pre-pruning. However, post-pruning is based on the trained decision tree and uses a bottom-up layer-by-layer scanning method. Therefore, the training time and cost are greater than pre-pruning.

Compared with front pruning, post-pruning schemes are more common, mainly because it is more difficult to accurately estimate when to stop growing the tree in the front pruning method.

Parent word: pruning
Related word: pre-pruning