Date

6 months ago

License

Apache 2.0

Tags

LongBench-Pro is a dataset released in 2025 for evaluating long-context language models. It aims to systematically assess the model's ability to understand and process long texts under different context lengths, task types, and operating conditions. This dataset contains 1,500 samples, including 11 Level 1 tasks and 25 Level 2 tasks. Tasks are categorized into full-context tasks and partial-context tasks based on their context usage. It includes both English and Chinese samples, with a balanced distribution of English and Chinese data. Task difficulty is categorized into four levels: Easy, Medium, Hard, and Extreme. Regarding context length, the samples cover six length ranges from 8k to 256k tokens, and are evenly distributed.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Discuss on Discord

Date

6 months ago

License

Apache 2.0

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Related Datasets

Sutra 10B Pretraining Teaching and Training Dataset

Lung Cancer Clinical Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

X-ray Contraband Detection Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Related Datasets

Sutra 10B Pretraining Teaching and Training Dataset

Lung Cancer Clinical Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

X-ray Contraband Detection Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

Sutra 10B Pretraining Teaching and Training Dataset

Lung Cancer Clinical Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

X-ray Contraband Detection Dataset

Related Datasets

Sutra 10B Pretraining Teaching and Training Dataset

Lung Cancer Clinical Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

X-ray Contraband Detection Dataset