Overfitting
Overfitting It is a phenomenon in machine learning. It refers to the situation where some attributes in the sample that are not needed for classification are learned. At this time, the learned decision tree model is not the optimal model and will lead to a decrease in generalization performance.
The impact of overfitting
In statistics and machine learning, overfitting is generally used to describe the random errors or noise in statistical models. It usually occurs when the model is too complex, such as when there are too many parameters. Overfitting weakens the predictive performance of the model and increases the volatility of the data.
What can you do to avoid overfitting?
There are many factors that lead to overfitting, usually due to overly powerful learning ability. Therefore, if you blindly pursue improving the prediction ability of training data, the complexity of the selected model is often higher than the true model, which will cause overfitting.
To avoid overfitting, it is necessary to use some additional techniques such as cross-validation, regularization, early stopping, Bayesian Information Criterion, Akaike Information Criterion, or model comparison to indicate when more training does not lead to better generalization.
How to solve overfitting
1) Re-clean the data. This method is suitable for situations where the data is impure;
2) Reduce the number of training samples;
3) Reduce the complexity of the model;
4) Increase the regularization term coefficient;
5) Use the Dropout method;
6) early stopping;
7) Reduce the number of iterations;
8) Increase the learning rate;
9) Add noise data;
10) Perform pruning in the tree structure.