AI Assists IRBs in Ethical Research Review Amid Debate Over Human Oversight and Bias
Ethicists are exploring the use of artificial intelligence to help review human research proposals, aiming to ease the burden on overworked institutional review boards (IRBs). Philip Nickel, a biomedical ethicist at Eindhoven University of Technology, recalls the tedious nature of IRB work—sifting through lengthy, often poorly written applications. He and others believe AI, particularly large language models (LLMs) like ChatGPT, Claude, and Gemini, could streamline the process by flagging common errors, missing information, or ethical red flags early on. While some see AI as a promising tool to improve efficiency and consistency, others warn of risks. Overreliance on AI could lead to complacency, and models may reflect biases from their training data. Still, given the growing backlog of IRB applications and the shortage of qualified reviewers, some argue that not using AI could be ethically problematic. Sebastian Porsdam Mann, a bioethicist at the University of Copenhagen, calls the exploration of AI in IRBs “ethically mandatory” in the current climate. Recent studies show promise. One found that several off-the-shelf LLMs accurately detected flaws in health study designs, such as weak risk-benefit analyses or inadequate participant protections. Another preprint reported that GPT-4o and Gemini 1.5 Pro identified 100% of issues in 50 animal research proposals submitted to an IRB-like panel. Researchers are now proposing to fine-tune LLMs using real IRB data—such as past decisions, institutional policies, and ethical guidelines—to make them more aligned with specific boards’ standards and cultural contexts. This could help models better reflect how human reviewers think. Using reasoning-capable models like OpenAI’s o-series or Anthropic’s Sonnet, which explain their logic step by step, may also increase transparency and trust. The goal is not to replace human reviewers but to offload routine checks, allowing experts to focus on complex ethical dilemmas. Seah Jiehao, a bioethicist at the National University of Singapore, emphasizes that AI should handle “mundane matters” so humans can concentrate on deeper ethical questions. Yet concerns remain. Holly Fernandez Lynch of the University of Pennsylvania warns that commercial IRBs—many of which operate without federal oversight—might prioritize speed and profit over thoroughness, using AI to cut corners. Donna Snyder of WCG IRB counters that AI could instead help experts quickly locate relevant precedents, improving decision quality. For under-resourced IRBs, especially in the Global South, AI could be a lifeline. Keymanthri Moodley of Stellenbosch University describes the strain of managing overwhelming workloads with limited staff. But she cautions that models trained on Western ethical frameworks may not apply in African contexts, risking cultural misalignment. Some researchers are already testing AI tools. Steph Grohmann of the Ludwig Boltzmann Gesellschaft developed EthicAlly, a prototype based on Claude Sonnet 4, which helps researchers identify ethical issues in social science proposals. In a test of 25 fictional cases, it correctly flagged ethical concerns in 24 instances, from missing consent details to serious issues like scientific racism. Grohmann and her colleagues plan to evaluate multiple commercial models and eventually develop open-source, institution-run versions that operate locally, protecting data privacy. They believe public, transparent, community-owned models may be the key to gaining trust. Still, Fernandez Lynch stresses that IRBs are fundamentally human processes. “There’s something inherently valuable and worth preserving about a group of people carefully deliberating together about whether research participants are being ethically protected,” she says. AI may assist, but the heart of ethical review lies in collective judgment and moral reflection.