HyperAIHyperAI

Command Palette

Search for a command to run...

Verbatim Spans Query Condition Evidence Extraction Dataset

Verbatim Spans is a multi-domain query conditional evidence extraction dataset released in April 2026 by TU Wien in collaboration with KRLabs. The related research paper is as follows: ACL-Verbatim: hallucination-free question answering for researchThe aim is to build a general benchmark for training query condition evidence extraction models, which can be widely used in retrieval augmentation generation (RAG) and extractive question answering tasks. This dataset contains 174,383 rows of training data and 20,174 rows of validation data, covering three major types of corpora: natural language processing papers, multi-domain question answering, and code and tool outputs, corresponding to paragraph-level, sentence-level, and code block-level evidence annotation tasks, respectively.

Data Source

  • ACL Silver: Covers NLP research papers, using paragraph-level annotation standards. After cleaning and filtering, it contains 20,916 training data points and 2,319 validation data points, representing only a subset of the original corpus.
  • RAGBench: Covering finance, healthcare, law, and general question-answering domains, it uses sentence-level annotation standards and a balanced sampling version with cap restrictions. The final dataset consists of 101,550 training data points and 15,276 validation data points.
  • Squeez: Covers both code and SWE-bench tool output, using code block/line range annotation specifications, and extracts structured data using 51,917 lines of training data and 2,579 lines of validation data.

Citation

@misc{Recski:2026,
title={ACL-Verbatim: hallucination-free question answering for research},
author={Gábor Recski and Szilveszter Tóth and Nadia Verdha and István Boros and Ádám Kovács},
year={2026},
eprint={2605.21102},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.21102},
}

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp