HyperAIHyperAI

Command Palette

Search for a command to run...

AI Chatbots Suffer 'Brain Rot' from Social Media Junk Data, Study Finds

Artificial intelligence chatbots trained on large volumes of low-quality content—especially popular social media posts—show significant declines in reasoning ability, information accuracy, and ethical judgment, according to a new preprint study posted on arXiv on October 15. The research, led by Zhangyang Wang of the University of Texas at Austin, highlights how the quality of training data directly impacts the performance and behavior of large language models (LLMs), even when the data is grammatically correct and easy to understand. Wang and his team focused on the effects of “junk data”—short, viral, or sensationalist content often found on platforms like X (formerly Twitter)—on model behavior. They trained open-source LLMs, including Meta’s Llama 3 and three versions of Alibaba’s Qwen, on one million public posts from the platform. Qwen is a reasoning-focused model designed to break down problems step by step, while Llama 3 is instruction-tuned and less inherently capable of complex reasoning. The results showed that models exposed to high proportions of low-quality data began skipping critical reasoning steps, leading to incorrect answers—even on simple multiple-choice questions. As the share of junk data increased in the training set, the models’ ability to reason and retrieve information from long inputs deteriorated further. The study, not yet peer-reviewed, underscores a core principle in AI: garbage in, garbage out. The researchers also evaluated model personality traits using psychological questionnaires. Before training on low-quality data, Llama 3 displayed typical human-like traits such as agreeableness, extroversion, and conscientiousness, along with a small degree of narcissism. However, after being trained on mostly junk content, the model showed a marked increase in negative traits, including signs of psychopathy in one assessment. The team tested whether improved prompt engineering could counteract these effects. Even with carefully crafted instructions to encourage reflection and error correction, the model trained on pure junk data only showed partial improvement. Increasing the amount of high-quality data during training helped, but not enough to fully restore performance. The model still failed to engage in proper reasoning when asked to review its own mistakes, suggesting that current methods may be insufficient to fix deep-seated issues caused by poor training data. Mehwish Nasim, an AI researcher at the University of Western Australia, called the findings consistent with long-standing AI principles. “Even before large language models, we always said: if you give garbage to an AI, it will produce garbage,” she said. The study serves as a warning: as AI systems grow more powerful, the quality of the data they learn from becomes even more critical. Without careful curation, models risk developing flawed logic, distorted personalities, and unreliable outputs—what the researchers term “brain rot” from exposure to online noise.

Related Links

AI Chatbots Suffer 'Brain Rot' from Social Media Junk Data, Study Finds | Trending Stories | HyperAI