HyperAI

Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA

Pletenev, Sergey ; Marina, Maria ; Ivanov, Nikolay ; Galimzianova, Daria ; Krayko, Nikita ; Salnikov, Mikhail ; Konovalov, Vasily ; Panchenko, Alexander ; Moskvoretskii, Viktor
Release Date: 6/9/2025
Will It Still Be True Tomorrow? Multilingual Evergreen Question
  Classification to Improve Trustworthy QA
Abstract

Large Language Models (LLMs) often hallucinate in question answering (QA)tasks. A key yet underexplored factor contributing to this is the temporalityof questions -- whether they are evergreen (answers remain stable over time) ormutable (answers change). In this work, we introduce EverGreenQA, the firstmultilingual QA dataset with evergreen labels, supporting both evaluation andtraining. Using EverGreenQA, we benchmark 12 modern LLMs to assess whether theyencode question temporality explicitly (via verbalized judgments) or implicitly(via uncertainty signals). We also train EG-E5, a lightweight multilingualclassifier that achieves SoTA performance on this task. Finally, we demonstratethe practical utility of evergreen classification across three applications:improving self-knowledge estimation, filtering QA datasets, and explainingGPT-4o retrieval behavior.