2 months ago
CytoImageNet: A large-scale pretraining dataset for bioimage transfer learning
Hua, Stanley Bryan Z. ; Lu, Alex X. ; Moses, Alan M.

Abstract
Motivation: In recent years, image-based biological assays have steadilybecome high-throughput, sparking a need for fast automated methods to extractbiologically-meaningful information from hundreds of thousands of images.Taking inspiration from the success of ImageNet, we curate CytoImageNet, alarge-scale dataset of openly-sourced and weakly-labeled microscopy images(890K images, 894 classes). Pretraining on CytoImageNet yields features thatare competitive to ImageNet features on downstream microscopy classificationtasks. We show evidence that CytoImageNet features capture information notavailable in ImageNet-trained features. The dataset is made available athttps://www.kaggle.com/stanleyhua/cytoimagenet.