TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

While large language models (LLMs) have demonstrated strong performance onfactoid question answering, they are still prone to hallucination anduntruthful responses, particularly when tasks demand information outside theirparametric knowledge. Indeed, truthfulness requires more than accuracy --models must also recognize uncertainty and abstain when unsure to avoidhallucinations. This presents a fundamental challenge for existing methods:approaches that optimize for accuracy often amplify hallucinations, while thosethat encourage abstention can become overly conservative, sacrificing correctanswers. Both extremes ultimately compromise truthfulness. In this work, wepresent TruthRL, a general reinforcement learning (RL) framework that directlyoptimizes the truthfulness of LLMs. Specifically, we implement TruthRL usingGRPO with a simple yet effective ternary reward that distinguishes correctanswers, hallucinations, and abstentions. It incentivizes models to reducehallucinations not only by providing correct responses, but also by enablingabstention when uncertain, thereby improving truthfulness. Extensiveexperiments across four knowledge-intensive benchmarks show that, compared tovanilla RL, TruthRL significantly reduces hallucinations by 28.9% and improvestruthfulness by 21.1%, with consistent gains across various backbone models(e.g., Qwen, Llama) under both retrieval and non-retrieval setups. In-depthablation study demonstrates that vanilla accuracy-driven methods, such assupervised fine-tuning or RL with a binary reward, struggle to balance factualcorrectness and uncertainty. In contrast, our proposed truthfulness-drivenTruthRL achieves strong performance in both accuracy and truthfulness,underscoring the importance of learning objective design for developingtruthful LLMs.