Shanghai Jiao Tong University Releases MedMNIST Medical Image Analysis Dataset & New Benchmark

Medical image analysis is a very complex interdisciplinary field. Recently, Shanghai Jiao Tong University released the MedMNIST dataset, which is expected to promote the development of medical image analysis.
The headache of medical image analysis
Medical image analysis is a recognized "difficult" topic.
First of all, it is an interdisciplinary field.Practitioners are required to have a wide range of knowledge backgrounds. Even if you are a professional studying computer vision or a clinical medicine practitioner, at best you have only taken half a step towards medical image analysis.
Optimistically, after years of study and research, you have finally mastered the two-way skills of computer vision and clinical medicine, and the next steps will make you worry about it.Because these data come from various sources, including X-ray, CT, ultrasound...It is so difficult to analyze and process so many non-standard data sets with different patterns!
This is not the end. Although deep learning has dominated the research and application of medical image analysis, the manpower cost of model adjustment is too high. AutoML is useful,However, there are currently basically no AutoML benchmarks for medical image classification.

Medical image analysis is fraught with difficulties, but the MedMNIST dataset recently released by Shanghai Jiao Tong University provides a powerful tool to solve these long-standing problems.
10 public datasets, 450,000 images reorganized
MedMNIST is a collection of 10 public medical datasets.All data have been preprocessed and divided into standard data sets including training set, validation set, and test subset. Data sources include different imaging modes such as X-ray, OCT, ultrasound, and CT, and multimodal data of the same lesion are obtained. Like the MNIST data set,MedMNIST can perform classification tasks on lightweight 28*28 images.

MedMNIST has the following characteristics:
Educational:The multimodal data comes from multiple public medical image datasets and uses the Creative Commons (CC) license or free license to facilitate teaching use.
standardization:All data has been preprocessed into the same format, lowering the entry barrier and making it available to anyone.
Diversity:The multimodal dataset covers different data modes, supports data sizes ranging from 100 to 100,000, and has rich task types such as binary classification, multivariate classification, ordinal regression, and multi-label.
Lightweight:The 28*28 image size facilitates rapid prototyping, fast iteration and experimentation of multimodal machine learning and AutoML algorithms.
MedMNIST Dataset
Publishing Agency:Shanghai Jiao Tong University
Quantity included:454,591 images
Data format:NPZ
Data size:654 MB
Release time:October 28, 2020
Download address:http://dwz.date/dew2
The decathlon method is good, creating a new benchmark for AutoML
Inspired by the Medical Segmentation Decathlon,Researchers from Shanghai Jiao Tong University also released the MedMNIST Classification Decathlon as a lightweight AutoML benchmark in medical image classification.
The researchers used the MedMNIST classification decathlon to evaluate the algorithm performance on all 10 datasets, and compared it with several other baseline methods, including ResNets (18, 50), auto-sklearn, AutoKeras, and Google AutoML Vision.

The experimental results show that there is no algorithm in the experiment that can achieve good generalization performance for all 10 data sets.This experiment is of great significance for exploring AutoML algorithms that generalize well across different data modes, task types, and data scales.
The MedMNIST classification decathlon benchmark will promote future research on AutoML for medical image analysis.
Related papers:
Open source address:
https://github.com/MedMNIST/MedMNIST
Now download the dataset and start your training
Download the dataset, train the machine learning model online, and start your practice with OpenBayes.
OpenBayes is a cloud service platform that provides cloud computing power for machine learning. It has a large-scale supercomputing cluster, supports GPU and CPU computing resources of various configurations, and has a general-purpose machine learning modeling system that can be used out of the box. Intelligent systems can be quickly established without machine learning experience.
Currently, OpenBayes' computing power container products already support TensorFlow, PyTorch, MXNet, Darknet, cpp-develop, etc. in CPU and GPU environments, different versions and types of standard machine learning frameworks and various common dependencies.

OpenBayes also provides CPU, NVIDIA T4, NVIDIA Tesla V100 and other computing resourcesWhether it is centralized training of massive data or low-power model resident operation, it can easily meet user needs.

The MedMNIST dataset is now available on OpenBayes.

Visit openbayes.com Register as a new user with the invitation code [HyperAI]You can enjoy240 minutes of CPU + 180 minutes of NVIDIA vGPU per week Free computing power~
accessThe following linksorClick to read the original article, start your MedMNIST exploration journey!
Link: http://dwz.date/dew2
-- over--