HyperAI

Oversampling

OversamplingIt means increasing the number of samples of a certain category in the training set to reduce category imbalance.

The corresponding method is undersampling, which means reducing the number of sampling times for a certain type of examples in the training set.

Comparison between oversampling and undersampling

Oversampling randomly copies the minority examples to increase their size, while undersampling randomly undersamples the majority class.

The benefit of oversampling is that it also replicates the amount of error. Conversely, undersampling can make the variance of the independent variables appear to be higher than it actually is.

Oversampling and class imbalance

Class imbalance refers to the uneven distribution of classes in the training set used to train the classifier. For example, in a binary classification problem, with 1,000 training samples, the ideal situation is that the number of positive and negative samples is similar; but if there are 995 positive samples and only 5 negative samples, it means that there is class imbalance.

Category imbalance will cause the model to not learn how to distinguish between a few categories, which will lead to deviations in judgment.

Category imbalance can be solved by oversampling, undersampling and adjusting the value of θ. Oversampling and undersampling refer to making appropriate sampling methods according to the number of samples in the training set, which ultimately alleviates category imbalance.

Parent word: Sampling
Synonyms; undersampling