HyperAIHyperAI

Command Palette

Search for a command to run...

Audit uncovers fake citations in 2.5 million biomedical papers

A comprehensive audit of 2.5 million biomedical research papers has uncovered a surge in fake citations, revealing nearly 3,000 documents containing references that cannot be traced to real publications. Published in The Lancet on May 7, this study represents the first academic attempt to estimate the scale of this issue in biomedical literature. The research highlights that the contamination of scientific papers with fabricated references is a rapidly escalating problem, with the number of affected publications increasing twelvefold between 2023 and 2025. The investigation utilized an automated pipeline designed by the study's authors, including AI researcher Maxim Topaz of Columbia University, to screen papers from PubMed Central. The system analyzed references published between January 2023 and February 2026. The methodology involved inspecting 125.6 million references cited across the dataset, focusing specifically on 97 million entries with valid Digital Object Identifiers or PubMed IDs. Large language models were employed to detect mismatches between the titles provided in the citations and the actual titles found in major scholarly databases, including PubMed, Crossref, OpenAlex, and Google Scholar. If a reference title did not appear in any of these repositories, it was classified as fabricated. The analysis identified 2,564 papers containing one or two fake references and 246 papers with three or more. A manual verification of 500 flagged references by three independent reviewers confirmed that seven out of ten were indeed fabricated. However, the authors caution that these figures are conservative underestimates. Topaz described the findings as merely scratching the tip of the iceberg, noting that the true prevalence is likely higher. The study suggests that generative artificial intelligence may be a driving force behind this trend, although it remains unclear whether these fakes are produced by computers or humans. Kathryn Weber-Boer, director of scientometrics at Digital Science, called the study a solid initial contribution to understanding the problem. She noted that the rapid growth in fake citations points toward an AI component. Weber-Boer also highlighted limitations in the verification process, observing that Google Scholar is not a fully reliable source because some fabricated references may appear there without tracing back to genuine publications. An earlier analysis by Nature estimated that approximately 1.6% of publications in 2025 contained at least one reference corresponding to a non-existent publication. The findings underscore a critical challenge for the integrity of biomedical science, where the rapid adoption of AI tools could inadvertently or intentionally pollute the scientific record. As the volume of AI-generated content grows, ensuring the accuracy of citations remains a pressing concern for the research community.

Related Links