Hallucinated Citations in NeurIPS Papers Reveal AI's Hidden Flaws Among Top Researchers
AI detection startup GPTZero analyzed all 4,841 papers accepted at the Conference on Neural Information Processing Systems (NeurIPS), one of the most prestigious venues in artificial intelligence, and identified 100 confirmed hallucinated citations across 51 papers—fake references generated by large language models. The findings, shared with TechCrunch, highlight a growing concern about the reliability of AI-assisted academic writing, even among top-tier researchers. While NeurIPS is known for its rigorous peer-review process and high standards in machine learning and AI research, the presence of fabricated citations underscores the challenges posed by the widespread use of AI tools in academic writing. The sheer volume of submissions—what GPTZero describes as a “submission tsunami”—has strained conference review pipelines, making it increasingly difficult for reviewers to catch subtle errors, including made-up references. It’s important to note that the discovery of 100 fake citations across over 4,800 papers is statistically minor—each paper typically includes dozens of references, meaning the overall error rate is extremely low. As NeurIPS told Fortune, which first reported the findings, “Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves [is] not necessarily invalidated.” A flawed citation doesn’t automatically undermine a paper’s core research or conclusions. Still, citations matter. They serve as a form of academic currency, signaling a researcher’s influence and contribution to the field. When citations are fabricated, even unintentionally, it degrades the integrity of this system and risks misrepresenting scholarly impact. Peer reviewers are not expected to catch every minor error, especially when dealing with thousands of references across hundreds of papers, but the fact that these inaccuracies slipped through raises questions about accountability. GPTZero emphasizes that its goal was not to shame researchers but to expose a systemic issue: the growing pressure on academics to produce more content faster, often with AI assistance, without adequate safeguards. The company points to a May 2025 paper titled “The AI Conference Peer Review Crisis” that details similar problems at top-tier conferences, including NeurIPS. The irony is stark: the world’s leading AI researchers, whose reputations depend on precision and credibility, are using AI tools to generate citations—yet still fail to verify them. If experts can’t ensure the accuracy of their AI-assisted work, what does that say about the reliability of AI use in less scrutinized contexts? The incident serves as a cautionary tale about overreliance on LLMs, even in the most elite academic circles.
