Date

5 months ago

Organization

Paper URL

2602.03587

License

Other

Tags

LLM

Text Generation

Benchmarks

CL-bench is a benchmark dataset for evaluating the context learning capabilities of a large language model, released in 2026 by Tencent's Hunyuan team in collaboration with Fudan University. The related research papers are as follows: CL-bench: A Benchmark for Context LearningThe aim is to test whether a model can learn new rules, concepts, or domain knowledge from a given context without relying on pre-trained knowledge and apply them to subsequent tasks. This dataset contains 500 complex context scenarios, covering 1,899 specific tasks, and provides 31,607 fine-grained evaluation rubrics. Each task is organized in a multi-turn dialogue format, covering various context learning scenarios such as rule reasoning, domain knowledge learning, and complex instruction understanding, to evaluate the model's ability to understand, summarize, and transfer new information in the context.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset Discuss on Discord

Date

5 months ago

Organization

Paper URL

2602.03587

License

Other

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

CL-bench Context Learning Evaluation Benchmark Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

CL-bench Context Learning Evaluation Benchmark Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Nemotron Personas France (French Synthetic Personas Dataset)

Groundsource Global Flood Events Dataset

CHIMERA General Inference Synthetic Dataset

Open-RL Inference Problem Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

RubricHub_v1 Multi-Domain Generative Task Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

RoVid-X Robot Video Generation Dataset

Google StreetView National Street View Image Dataset

DeepPlanning Long-Term Planning Capability Assessment Dataset

Vehicles OpenImages Vehicle Image Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

Human Face Emotions Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

X-ray Contraband Detection Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

CL-bench Context Learning Evaluation Benchmark Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Nemotron Personas France (French Synthetic Personas Dataset)

Groundsource Global Flood Events Dataset

CHIMERA General Inference Synthetic Dataset

Open-RL Inference Problem Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

RubricHub_v1 Multi-Domain Generative Task Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

RoVid-X Robot Video Generation Dataset

Google StreetView National Street View Image Dataset

DeepPlanning Long-Term Planning Capability Assessment Dataset

Vehicles OpenImages Vehicle Image Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

Human Face Emotions Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

X-ray Contraband Detection Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Nemotron Personas France (French Synthetic Personas Dataset)

Groundsource Global Flood Events Dataset

CHIMERA General Inference Synthetic Dataset

Open-RL Inference Problem Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

RubricHub_v1 Multi-Domain Generative Task Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

RoVid-X Robot Video Generation Dataset

Google StreetView National Street View Image Dataset

DeepPlanning Long-Term Planning Capability Assessment Dataset

Vehicles OpenImages Vehicle Image Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

Human Face Emotions Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

X-ray Contraband Detection Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Nemotron Personas France (French Synthetic Personas Dataset)

Groundsource Global Flood Events Dataset

CHIMERA General Inference Synthetic Dataset

Open-RL Inference Problem Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

RubricHub_v1 Multi-Domain Generative Task Dataset