BioCLIP 2: AI Model Trained on NVIDIA GPUs Identifies Over a Million Species and Maps Ecosystem Relationships
Tanya Berger-Wolf, director of the Translational Data Analytics Institute and a professor at The Ohio State University, has led the development of BioCLIP 2, a groundbreaking biology-based foundation model trained on the largest and most diverse dataset of organisms ever assembled. The model, which will be presented at the NeurIPS AI research conference in November, can identify over a million species and analyze complex biological relationships without explicit instruction. Originally inspired by a friendly bet with a colleague about identifying individual zebras faster than a zoologist, Berger-Wolf’s work has evolved into a transformative tool for conservation and ecological research. BioCLIP 2 goes beyond simple image recognition. It can infer traits such as age, sex, and health status in animals and plants, and even organize species based on biological characteristics like beak size in Darwin’s finches—without being taught what size means. The model was trained on TREEOFLIFE-200M, a dataset of 214 million images spanning over 925,000 taxonomic classes, including everything from beetles and fungi to whales and magnolias. This massive collection was curated through collaboration with the Smithsonian Institution, universities, and field experts worldwide. Training took 10 days using 32 NVIDIA H100 GPUs, followed by inference tasks on a cluster of 64 NVIDIA Tensor Core GPUs and individual units for real-time analysis. The model demonstrated remarkable abilities—learning taxonomic hierarchies on its own, distinguishing between juveniles and adults, and detecting plant diseases by separating healthy from diseased leaves. BioCLIP 2 is open-source and available on Hugging Face, where it was downloaded over 45,000 times in a single month. It builds on the success of the original BioCLIP model, which won the Best Student Paper award at CVPR. One of the model’s most promising applications is addressing the critical data gap in conservation biology. Many species, including killer whales and polar bears, lack sufficient population data. Smaller, less studied organisms like fungi and insects face even greater challenges. BioCLIP 2 helps bridge this gap by extracting meaningful insights from limited or fragmented data. Looking ahead, Berger-Wolf’s team is developing a wildlife digital twin—a dynamic, interactive simulation of ecosystems that allows scientists to study species interactions, test environmental scenarios, and explore ecological relationships in a non-invasive way. This tool could one day be used in public settings, such as zoos, where visitors could experience the world from a zebra’s or spider’s perspective. “This technology gives us a way to understand nature without disrupting it,” Berger-Wolf said. “It’s not just about data—it’s about empathy, imagination, and connection.” The model exemplifies how accelerated computing with NVIDIA GPUs is enabling a new era of AI-driven biology and conservation science.
