HyperAIHyperAI
Back to Headlines

Researchers digitize pollen from 18,000 plant species to create open-access AI-ready database

19 days ago

A team of researchers from the Smithsonian Tropical Research Institute is digitizing pollen from over 18,000 plant species, primarily from tropical regions, as part of a major initiative called PollenGEO. The project, published in the journal PLANTS, PEOPLE, PLANET, aims to create a comprehensive, publicly accessible digital database of pollen images to support scientific research across multiple fields. The Smithsonian’s palynological collection, housed at STRI and the National Museum of Natural History, contains more than 18,000 species—making it one of the world’s largest pollen repositories. Each sample is preserved on microscope slides, often accompanied by detailed index cards describing the source. To digitize this vast collection, over 30 researchers and students led by palynologist Carlos Jaramillo have captured more than 40 million images of pollen grains. This effort has been supported by approximately 100 volunteers through the Smithsonian Transcription Center, who helped transcribe data from index cards into a digital format. The collection draws from several key sources, including the Graham Palynological Collection, the Joan Nowicke collection, the Barro Colorado Island collection by Dave Roubik and Enrique Moreno, the Amazon collection by Paul Collinvaux, and the Sian Ka'an collection from southeastern Mexico, which includes 650 species. Around 1,000 fossil pollen samples from museum archives have also been scanned. Pollen is highly valuable in science due to its durability—some grains can survive for hundreds of millions of years—offering precise insights into Earth’s past. Each plant species produces uniquely shaped pollen, making it a powerful tool for identifying ancient flora and understanding climate change, dating geological layers, and even forensic analysis, such as determining the origin of clothing found at crime scenes. Traditionally, identifying pollen required hours of microscopic examination by specialists using printed reference guides. This process is especially difficult in tropical regions, where biodiversity is high and many species remain unnamed. The new database will enable machine learning models to analyze pollen quickly and accurately, dramatically speeding up research. Associate Professor Surangi Punyasena from the University of Illinois Urbana-Champaign is developing the AI system that will be trained on the PollenGEO dataset. The project supports larger scientific efforts, including the Trans-Amazon Drilling project, which uses pollen from core samples to reconstruct the Amazon’s ecological history. Researchers from institutions in Brazil and the UK are collaborating on this initiative. The PollenGEO database will be freely available online, transforming pollen analysis from a niche, manual task into a digital, collaborative, and scalable process. Andrés Díaz recently presented a webinar in Spanish explaining the digitization process and its scientific impact.

Related Links