AI Judges Texts Fairly—Until It Knows the Source
Large language models (LLMs) are increasingly being used not just to generate content, but also to evaluate it. From grading student essays and summarizing reports to moderating social media posts and screening job applications, these AI systems are being deployed across a wide range of decision-making contexts. However, a growing body of research reveals a troubling pattern: while LLMs appear to assess text objectively, their evaluations can shift dramatically when they learn the identity of the author—especially if that author is associated with a marginalized group. Studies have shown that when LLMs evaluate the same piece of writing without knowing the author’s background, they often rate it fairly, based on content quality, coherence, and structure. But once the model is informed that the author is a woman, a person of color, or from a non-Western country, the same text is frequently rated lower. This shift is not due to changes in the text itself, but to the model’s learned associations—biases embedded in the vast datasets it was trained on, which often reflect societal inequities. For example, one experiment found that an AI system rated a well-written essay as less competent when it was attributed to a Black author, compared to when the same essay was attributed to a white author. Another study showed that job application summaries written by women were consistently scored lower in terms of leadership potential, even when the content was identical. These findings highlight a critical flaw: the illusion of neutrality. LLMs are not impartial arbiters. They reflect the biases present in the data they were trained on, and when they gain access to demographic or social cues, those biases can surface in ways that undermine fairness and equity. Researchers warn that relying on AI for high-stakes evaluations—such as academic assessments, hiring decisions, or content moderation—without accounting for these hidden biases can perpetuate and even amplify existing inequalities. The problem is not that the models are broken, but that they are trained on a world that is not fair. Solutions are emerging, including debiasing techniques, more diverse training data, and the development of evaluation frameworks that test for fairness. But experts stress that transparency and human oversight remain essential. AI should not be used to make final decisions in sensitive areas without a clear understanding of how and why it reached a given conclusion. Ultimately, the goal is not to eliminate AI from evaluation tasks, but to use it responsibly—ensuring that the systems we build do not simply mirror the biases of the past, but help us move toward a more equitable future.
