Pool Based Sampling
Pool based sampling is a popular active learning method that selects informative examples for labeling. A pool of unlabeled data is created, and the model selects the most informative examples for manual annotation. These labeled examples are used to retrain the model, and the process is repeated.
advantage
- Reduce labeling costs:Compared to traditional supervised learning methods, pool-based sampling reduces the overall labeling cost because it only requires labeling the most informative samples. This can result in significant cost savings, especially when dealing with large datasets.
- Effective use of expert time:Since experts only need to mark the samples with the most information, this strategy can effectively utilize expert time and save time and resources.
- Improve model accuracy:The selected samples are more likely to be informative and representative of the data, so pool-based sampling can improve the accuracy of the model.
shortcoming
- Selection of unlabeled data pool:The quality of the selected data affects the performance of the model, so it is crucial to carefully select the pool of unlabeled data. This can be challenging, especially for large and complex datasets.
- Quality of selection method:The quality of the selection method used to choose the most informative samples can affect the accuracy of the model. If the selection method is not appropriate for the data or is poorly designed, the accuracy of the model may be affected.
- Not suitable for all data types:Pooling-based sampling may not be suitable for all types of data, such as unstructured data or noisy data. In these cases, other active learning methods may be more appropriate.
References
【1】https://encord.com/glossary/pool-based-sampling/