Pinocchio Pinocchio Factual Knowledge Evaluation Dataset
Date
Size
Publish URL
Categories

The Pinocchio dataset was jointly created by researchers from Tsinghua University, University of Illinois at Chicago, and University of Cambridge. Its purpose is to comprehensively evaluate the performance of large language models (LLMs) in factual knowledge storage and reasoning capabilities.
This dataset covers 20,000 diverse factual questions covering different sources, timelines, domains, regions, and languages.The dataset contains 7 different tasks to test LLMs’ ability to reason over multiple facts, handle structured and unstructured knowledge, identify subtle factual differences, and resist adversarial examples. Pinocchio provides researchers with a powerful tool to understand the capabilities of models at multiple levels while pushing the boundaries of LLMs’ ability to advance factual knowledge.