AI Agents Bypass Detection in Online Studies, Threatening Social Science Research Integrity
The rapid advancement of artificial intelligence is threatening the reliability of online studies, a cornerstone of modern social science research. For years, researchers have used online surveys, games, and experiments to gather data from large, diverse populations quickly and affordably. To ensure data quality, they’ve developed detection tools to filter out inattentive participants, bots, and fraudulent users. But recent findings show that today’s large language models (LLMs) can now mimic human behavior so convincingly that they routinely bypass these safeguards. Sean Westwood, a political scientist at Dartmouth College, demonstrated this in a study published in the Proceedings of the National Academy of Sciences. He created an AI agent using OpenAI’s o4-mini model that could extract survey questions and options, generate responses, and feed them back into survey platforms. Over 300 trials, the agent consistently evaded detection. It even passed a classic test—responding “17” instead of the first five digits of pi—designed to catch AI. The model also replicated realistic human behaviors: typing letter by letter, making and correcting typos, and mimicking natural mouse movements. The AI could also adopt personas. When prompted to act like a Ph.D. in science, it solved complex math problems. When posing as a wealthy individual, it reported higher income and larger housing. These capabilities raise alarms among researchers. “It’s like what Nietzsche said about God: It’s dead and we killed it,” said Jon Roozenbeek, a computational social scientist at the University of Cambridge, referring to the end of the era of easy, large-scale data. Anne-Marie Nussberger, a behavioral scientist at the Max Planck Institute, warned that even though only a small fraction of participants may use such tools, the ability to scale these efforts means a large volume of fake data could be generated. She also noted that human participants may now alter their behavior due to awareness of AI use—such as changing strategies in games if they suspect their opponent is an AI. Platforms like Prolific and CloudResearch are taking notice. Andrew Gordon of Prolific called Westwood’s work a “warning shot.” Leib Litman, chief research officer at CloudResearch, said his team has already identified global click farms running fraudulent surveys. While their current detection system, which analyzes mouse movements, caught the AI agent in Westwood’s study, Litman stressed that the threat evolves rapidly—sometimes within days. The challenge is especially acute on mobile devices, where mouse movement tracking doesn’t apply. Yamil Velez of Columbia University is exploring physical interaction-based detection, such as asking users to block and unblock their device’s camera at intervals. If online research becomes unreliable, social scientists may lose access to diverse, international samples. But Roozenbeek argues that the supposed representativeness of online data has been overstated—many studies still attract only urban, educated participants. He advocates for stronger international collaboration to achieve real diversity. Despite the risks, some researchers still see value in online data. Robert West of EPFL believes it remains useful for certain purposes. But for studies requiring authentic human behavior, he says: “Right now, I’d be very, very skeptical.” The era of easy, cheap, large-scale data may be over—and the field must adapt.
