HyperAI

Bias-variance Tradeoff

In statistics and machine learning, the bias-variance tradeoff describes the relationship between the complexity of a model, its predictive accuracy, and its ability to make predictions on previously unseen data that was not used to train the model.In general, as you increase the number of adjustable parameters in a model, it becomes more flexible and can better fit the training dataset. However, with more flexible models, there tends to be more variance in the model fit each time a new training dataset is created using a new example.

The bias-variance dilemma or bias-variance problem is the conflict of trying to minimize these two sources of error simultaneously.These two sources of error prevent supervised learning algorithms from generalizing beyond their training set:

  • Bias error is the error caused by incorrect assumptions in the learning algorithm. High bias may cause the algorithm to miss the relevant relationship between features and target output (underfitting).
  • Variance is the error caused by sensitivity to small fluctuations in the training set. Algorithms that model random noise in the training data may result in high variance (overfitting).

The bias-variance tradeoff is a core issue in supervised learning. The ideal is to choose a model that accurately captures regularities in the training data while also generalizing well to unseen data. Unfortunately, it is often impossible to achieve both. A learning method with high variance may be able to represent its training set well, but runs the risk of overfitting to noisy or unrepresentative training data. Conversely, algorithms with high bias often produce simpler models that may not capture important regularities in the data (i.e., underfit).

References

【1】https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff