HyperAI

TreeOfLife-10M Biological Image Dataset

Date

a year ago

Organization

Microsoft Research
Download Help
特色图像

With over 10 million images covering 454,000 taxa in the Tree of Life, TreeOfLife-10M is the largest dataset of ML-ready biological organism images and their associated taxonomic labels to date. It expands on the foundation established by existing high-quality datasets such as iNat21 and BIOSCAN-1M, and further incorporates new curated images from the Encyclopedia of Life (eol.org), which provide the majority of the data diversity in TreeOfLife-10M. Each image in TreeOfLife-10M is labeled to the most specific taxonomic level, as well as higher taxonomic levels in the Tree of Life (see theText Type). TreeOfLife-10M is generated for the purpose of training BioCLIP and future biologically based models.

The dataset can be used in multiple fields, including biodiversity research, species identification, natural language processing tasks, machine learning, and computer vision research.

This dataset was released in 2024 by Ohio State University, Microsoft Research and other institutions.BioCLIP: A Vision Foundation Model for the Tree of Life" is the best paper of CVPR 2024.