HyperAI

Data Augmentation

Data augmentation is a technique that artificially increases the training set by creating a modified copy of the dataset using existing data., which is one of the commonly used techniques in deep learning, including making small changes to the data set or using deep learning to generate new data points. Data augmentation is mainly used to increase the training data set, making the data set as diverse as possible, so that the trained model has stronger generalization ability. Existing major deep learning frameworks already come with data augmentation.

Scenarios for using data augmentation

  1. Prevent model overfitting.
  2. The initial training set is too small.
  3. To improve the model accuracy.
  4. Reduce operational costs of labeling and cleaning raw datasets. 

Limitations of Data Augmentation

  • The biases in the original dataset are still present in the augmented data.
  • Quality assurance for data augmentation is costly. 
  • Research and development are needed to build systems with advanced applications. For example, generating high-resolution images using GANs can be challenging.
  • Finding effective data augmentation methods can be challenging.