HyperAIHyperAI
2 months ago

TXL-PBC: a freely accessible labeled peripheral blood cell dataset

Gan, Lu ; Li, Xi
TXL-PBC: a freely accessible labeled peripheral blood cell dataset
Abstract

In a recent study, we found that publicly BCCD and BCD datasets havesignificant issues such as labeling errors, insufficient sample size, and poordata quality. To address these problems, we performed sample deletion,re-labeling, and integration of these two datasets. Additionally, we introducedthe PBC and Raabin-WBC datasets, and ultimately created a high-quality,sample-balanced new dataset, which we named TXL-PBC. The dataset contains 1008training sets, 288 validation sets, and 144 test sets. Firstly, The datasetunderwent strict manual annotation, automatic annotation with YOLOv8n model,and manual audit steps to ensure the accuracy and consistency of annotations.Secondly, we addresses the blood cell mislabeling problem of the originaldatasets. The distribution of label boundary box areas and the number of labelsare better than the BCCD and BCD datasets. Moreover, we used the YOLOv8n modelto train these three datasets, the performance of the TXL-PBC dataset surpassthe original two datasets. Finally, we employed YOLOv5n, YOLOv5s, YOLOv5l,YOLOv8s, YOLOv8m detection models as the baseline models for TXL-PBC. Thisstudy not only enhances the quality of the blood cell dataset but also supportsresearchers in improving models for blood cell target detection. We publishedour freely accessible TXL-PBC dataset athttps://github.com/lugan113/TXL-PBC_Dataset.

TXL-PBC: a freely accessible labeled peripheral blood cell dataset | Latest Papers | HyperAI