HyperAI

TreeOfLife-200M Biological Vision Dataset

Download Help

TreeOfLife-200M is a large-scale biological vision dataset released by Ohio State University in 2025. The related paper results are:BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning", designed to train biologically based models. This dataset is currently the largest and most diverse public machine learning-ready dataset for biological computer vision models.

The dataset contains nearly 214 million images, covering 952,000 species categories, and integrates images and metadata from four core biodiversity data providers: the Global Biodiversity Information Facility (GBIF), the Encyclopedia of Life (EOL), BIOSCAN-5M, and FathomNet. The dataset also increases the diversity of image context by fully covering museum specimens, camera traps, and citizen science images.

Dataset Example