Command Palette
Search for a command to run...
LongBench-Pro Long Context Comprehensive Evaluation Dataset
LongBench-Pro is a dataset released in 2025 for evaluating long-context language models. It aims to systematically assess the model's ability to understand and process long texts under different context lengths, task types, and operating conditions.
This dataset contains 1,500 samples, including 11 Level 1 tasks and 25 Level 2 tasks. Tasks are categorized into full-context tasks and partial-context tasks based on their context usage. It includes both English and Chinese samples, with a balanced distribution of English and Chinese data. Task difficulty is categorized into four levels: Easy, Medium, Hard, and Extreme. Regarding context length, the samples cover six length ranges from 8k to 256k tokens, and are evenly distributed.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.