Shanghai Jiao Tong University Releases MedMNIST Medical Image Analysis Dataset & New Benchmark

5 years ago

Medical image analysis is a very complex interdisciplinary field. Recently, Shanghai Jiao Tong University released the MedMNIST dataset, which is expected to promote the development of medical image analysis.

The headache of medical image analysis

Medical image analysis is a recognized "difficult" topic.

First of all, it is an interdisciplinary field.Practitioners are required to have a wide range of knowledge backgrounds. Even if you are a professional studying computer vision or a clinical medicine practitioner, at best you have only taken half a step towards medical image analysis.

Optimistically, after years of study and research, you have finally mastered the two-way skills of computer vision and clinical medicine, and the next steps will make you worry about it.Because these data come from various sources, including X-ray, CT, ultrasound...It is so difficult to analyze and process so many non-standard data sets with different patterns!

This is not the end. Although deep learning has dominated the research and application of medical image analysis, the manpower cost of model adjustment is too high. AutoML is useful,However, there are currently basically no AutoML benchmarks for medical image classification.

**MedMNIST Classification Decathlon at a Glance**

Medical image analysis is fraught with difficulties, but the MedMNIST dataset recently released by Shanghai Jiao Tong University provides a powerful tool to solve these long-standing problems.

10 public datasets, 450,000 images reorganized

MedMNIST is a collection of 10 public medical datasets.All data have been preprocessed and divided into standard data sets including training set, validation set, and test subset. Data sources include different imaging modes such as X-ray, OCT, ultrasound, and CT, and multimodal data of the same lesion are obtained. Like the MNIST data set,MedMNIST can perform classification tasks on lightweight 28*28 images.

**Data modes, applicable tasks, and number of images of the ten datasets**

MedMNIST has the following characteristics:

Educational:The multimodal data comes from multiple public medical image datasets and uses the Creative Commons (CC) license or free license to facilitate teaching use.

standardization:All data has been preprocessed into the same format, lowering the entry barrier and making it available to anyone.

Diversity:The multimodal dataset covers different data modes, supports data sizes ranging from 100 to 100,000, and has rich task types such as binary classification, multivariate classification, ordinal regression, and multi-label.

Lightweight:The 28*28 image size facilitates rapid prototyping, fast iteration and experimentation of multimodal machine learning and AutoML algorithms.

MedMNIST Dataset

Publishing Agency:Shanghai Jiao Tong University

Quantity included:454,591 images

Data format:NPZ

Data size:654 MB

Release time:October 28, 2020

Download address:http://dwz.date/dew2

The decathlon method is good, creating a new benchmark for AutoML

Inspired by the Medical Segmentation Decathlon,Researchers from Shanghai Jiao Tong University also released the MedMNIST Classification Decathlon as a lightweight AutoML benchmark in medical image classification.

The researchers used the MedMNIST classification decathlon to evaluate the algorithm performance on all 10 datasets, and compared it with several other baseline methods, including ResNets (18, 50), auto-sklearn, AutoKeras, and Google AutoML Vision.

The experimental results show that there is no algorithm in the experiment that can achieve good generalization performance for all 10 data sets.This experiment is of great significance for exploring AutoML algorithms that generalize well across different data modes, task types, and data scales.

The MedMNIST classification decathlon benchmark will promote future research on AutoML for medical image analysis.

Related papers:

https://arxiv.org/pdf/2010.14925.pdf

Open source address:

https://github.com/MedMNIST/MedMNIST

Now download the dataset and start your training

Download the dataset, train the machine learning model online, and start your practice with OpenBayes.

OpenBayes is a cloud service platform that provides cloud computing power for machine learning. It has a large-scale supercomputing cluster, supports GPU and CPU computing resources of various configurations, and has a general-purpose machine learning modeling system that can be used out of the box. Intelligent systems can be quickly established without machine learning experience.

Currently, OpenBayes' computing power container products already support TensorFlow, PyTorch, MXNet, Darknet, cpp-develop, etc. in CPU and GPU environments, different versions and types of standard machine learning frameworks and various common dependencies.

OpenBayes also provides CPU, NVIDIA T4, NVIDIA Tesla V100 and other computing resourcesWhether it is centralized training of massive data or low-power model resident operation, it can easily meet user needs.

The MedMNIST dataset is now available on OpenBayes.

Visit openbayes.com Register as a new user with the invitation code [HyperAI]You can enjoy240 minutes of CPU + 180 minutes of NVIDIA vGPU per week Free computing power~

accessThe following linksorClick to read the original article, start your MedMNIST exploration journey!

Link: http://dwz.date/dew2

-- over--

Shanghai Jiao Tong University Releases MedMNIST Medical Image Analysis Dataset & New Benchmark

5 years ago

Big Factory News

Information

Medical Imaging

The headache of medical image analysis

Medical image analysis is a recognized "difficult" topic.

Medical image analysis is fraught with difficulties, but the MedMNIST dataset recently released by Shanghai Jiao Tong University provides a powerful tool to solve these long-standing problems.

10 public datasets, 450,000 images reorganized

MedMNIST has the following characteristics:

Educational:The multimodal data comes from multiple public medical image datasets and uses the Creative Commons (CC) license or free license to facilitate teaching use.

standardization:All data has been preprocessed into the same format, lowering the entry barrier and making it available to anyone.

Lightweight:The 28*28 image size facilitates rapid prototyping, fast iteration and experimentation of multimodal machine learning and AutoML algorithms.

MedMNIST Dataset

Publishing Agency:Shanghai Jiao Tong University

Quantity included:454,591 images

Data format:NPZ

Data size:654 MB

Release time:October 28, 2020

Download address:http://dwz.date/dew2

The decathlon method is good, creating a new benchmark for AutoML

The MedMNIST classification decathlon benchmark will promote future research on AutoML for medical image analysis.

Related papers:

https://arxiv.org/pdf/2010.14925.pdf

Open source address:

https://github.com/MedMNIST/MedMNIST

Now download the dataset and start your training

Download the dataset, train the machine learning model online, and start your practice with OpenBayes.

The MedMNIST dataset is now available on OpenBayes.

Visit openbayes.com Register as a new user with the invitation code [HyperAI]You can enjoy240 minutes of CPU + 180 minutes of NVIDIA vGPU per week Free computing power~

accessThe following linksorClick to read the original article, start your MedMNIST exploration journey!

Link: http://dwz.date/dew2

-- over--

Command Palette

Shanghai Jiao Tong University Releases MedMNIST Medical Image Analysis Dataset & New Benchmark

The headache of medical image analysis

10 public datasets, 450,000 images reorganized

The decathlon method is good, creating a new benchmark for AutoML

Now download the dataset and start your training

Command Palette

Shanghai Jiao Tong University Releases MedMNIST Medical Image Analysis Dataset & New Benchmark

The headache of medical image analysis

10 public datasets, 450,000 images reorganized

The decathlon method is good, creating a new benchmark for AutoML

Now download the dataset and start your training

Related News

A New state-of-the-art Document Parsing Platform! MinerU's New Version Innovates a two-stage "coarse-to-fine" Parsing Strategy; S2S Domain Benchmark Debuts! Tencent's Latest Benchmark Dataset Evaluates Speech Model capabilities.

Online Tutorial | Customize Diverse 3D Christmas Trees Using AI Gesture Recognition and Only CPU

Stanford, Peking University, UCL, and UC Berkeley Collaborated to Use CNN to Accurately Identify Seven Rare Lenticular Samples From 810,000 quasars.

From "assistant" to "user," Microsoft UserLM-8B Simulates Real Human Conversations, Driving a New Wave of LLM optimization. Designed for Lightweight Performance, Extract-0 Helps small-parameter Models Achieve Accurate Information extraction.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends

Full Agenda | Shanghai Innovation Center, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for an in-depth Analysis of the Entire Process of Operator optimization.

Technical Salon | Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for In-Depth Analysis of the Entire Chain of Operator Optimization Practices

Real-time Object Detection State-of-the-art! YOLOv13 Expands Global Awareness Capabilities; Selected for NeurIPS 2025, UltraHR-100K Unlocks ultra-high Resolution Textural images.

Command Palette

Shanghai Jiao Tong University Releases MedMNIST Medical Image Analysis Dataset & New Benchmark

The headache of medical image analysis

10 public datasets, 450,000 images reorganized

The decathlon method is good, creating a new benchmark for AutoML

Now download the dataset and start your training

Related News

A New state-of-the-art Document Parsing Platform! MinerU's New Version Innovates a two-stage "coarse-to-fine" Parsing Strategy; S2S Domain Benchmark Debuts! Tencent's Latest Benchmark Dataset Evaluates Speech Model capabilities.

Online Tutorial | Customize Diverse 3D Christmas Trees Using AI Gesture Recognition and Only CPU

Stanford, Peking University, UCL, and UC Berkeley Collaborated to Use CNN to Accurately Identify Seven Rare Lenticular Samples From 810,000 quasars.

From "assistant" to "user," Microsoft UserLM-8B Simulates Real Human Conversations, Driving a New Wave of LLM optimization. Designed for Lightweight Performance, Extract-0 Helps small-parameter Models Achieve Accurate Information extraction.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends

Full Agenda | Shanghai Innovation Center, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for an in-depth Analysis of the Entire Process of Operator optimization.

Technical Salon | Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for In-Depth Analysis of the Entire Chain of Operator Optimization Practices

Real-time Object Detection State-of-the-art! YOLOv13 Expands Global Awareness Capabilities; Selected for NeurIPS 2025, UltraHR-100K Unlocks ultra-high Resolution Textural images.

Related News

A New state-of-the-art Document Parsing Platform! MinerU's New Version Innovates a two-stage "coarse-to-fine" Parsing Strategy; S2S Domain Benchmark Debuts! Tencent's Latest Benchmark Dataset Evaluates Speech Model capabilities.

Online Tutorial | Customize Diverse 3D Christmas Trees Using AI Gesture Recognition and Only CPU

Stanford, Peking University, UCL, and UC Berkeley Collaborated to Use CNN to Accurately Identify Seven Rare Lenticular Samples From 810,000 quasars.

From "assistant" to "user," Microsoft UserLM-8B Simulates Real Human Conversations, Driving a New Wave of LLM optimization. Designed for Lightweight Performance, Extract-0 Helps small-parameter Models Achieve Accurate Information extraction.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends

Full Agenda | Shanghai Innovation Center, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for an in-depth Analysis of the Entire Process of Operator optimization.

Technical Salon | Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for In-Depth Analysis of the Entire Chain of Operator Optimization Practices

Real-time Object Detection State-of-the-art! YOLOv13 Expands Global Awareness Capabilities; Selected for NeurIPS 2025, UltraHR-100K Unlocks ultra-high Resolution Textural images.

Related News

A New state-of-the-art Document Parsing Platform! MinerU's New Version Innovates a two-stage "coarse-to-fine" Parsing Strategy; S2S Domain Benchmark Debuts! Tencent's Latest Benchmark Dataset Evaluates Speech Model capabilities.

Online Tutorial | Customize Diverse 3D Christmas Trees Using AI Gesture Recognition and Only CPU

Stanford, Peking University, UCL, and UC Berkeley Collaborated to Use CNN to Accurately Identify Seven Rare Lenticular Samples From 810,000 quasars.

From "assistant" to "user," Microsoft UserLM-8B Simulates Real Human Conversations, Driving a New Wave of LLM optimization. Designed for Lightweight Performance, Extract-0 Helps small-parameter Models Achieve Accurate Information extraction.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends

Full Agenda | Shanghai Innovation Center, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for an in-depth Analysis of the Entire Process of Operator optimization.

Technical Salon | Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for In-Depth Analysis of the Entire Chain of Operator Optimization Practices

Real-time Object Detection State-of-the-art! YOLOv13 Expands Global Awareness Capabilities; Selected for NeurIPS 2025, UltraHR-100K Unlocks ultra-high Resolution Textural images.