Date

5 months ago

Organization

Paper URL

License

Other

*This dataset supports online use.Click here to jump.

WideSearch is the first agent evaluation benchmark dataset designed for "broad info-seeking" released by ByteDance's Seed team in 2025. The related paper results are "WideSearch:Benchmarking Agentic Broad Info-Seeking", which aims to systematically evaluate and promote the reliability and integrity of large language models in large-scale fact collection, synthesis and verifiable structured output.

The benchmark consists of 200 high-quality questions (100 English questions and 100 Chinese questions) carefully selected and manually cleaned by the research team from real user queries. These questions come from more than 15 different fields.

Data Fields:

instance_id: unique ID of the task (corresponding to the gold CSV file name).
query: A natural language instruction, usually specifying the required column names and Markdown table output requirements.
evaluation: a serialized (string) object used for automatic evaluation, containing:
- unique_columns: primary key columns (for row alignment);
- required: column name that must appear;
- eval_pipeline: column-level evaluation configuration (such as preprocess, metric, criterion).
language: Task language, the value can be en or zh.

Data construction and automatic evaluation flow chart

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset Discuss on Discord

Date

5 months ago

Organization

Paper URL

2508.07999

License

Other

*This dataset supports online use.Click here to jump.

Data Fields:

instance_id: unique ID of the task (corresponding to the gold CSV file name).
query: A natural language instruction, usually specifying the required column names and Markdown table output requirements.
evaluation: a serialized (string) object used for automatic evaluation, containing:
- unique_columns: primary key columns (for row alignment);
- required: column name that must appear;
- eval_pipeline: column-level evaluation configuration (such as preprocess, metric, criterion).
language: Task language, the value can be en or zh.

Related Datasets

DeepPlanning Long-Term Planning Capability Assessment Dataset

6 days ago

IF-Bench Infrared Image Understanding Benchmark Dataset

2 months ago

PhysToolBench Physics Tool Task Dataset

2 months ago

1.56 GB58

Envision Multi-Stage Event Visual Generation Dataset

2 months ago

UNO-Bench full-modal Evaluation Benchmark Dataset

3 months ago

9.71 GB69

OpenGU Graph Forgetting Comprehensive Evaluation Dataset

2 months ago

SSRB Semi-structured Data Natural Language Query Dataset

2 months ago

RoVid-X Robot Video Generation Dataset

6 days ago

FrontierScience Inference Research Task Evaluation Dataset

2 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

WideSearch Information Gathering Benchmark Dataset

*This dataset supports online use.Click here to jump.

Data Fields:

Build AI with AI

HyperAI Newsletters

Command Palette

WideSearch Information Gathering Benchmark Dataset

*This dataset supports online use.Click here to jump.

Data Fields:

Related Datasets

DeepPlanning Long-Term Planning Capability Assessment Dataset

IF-Bench Infrared Image Understanding Benchmark Dataset

PhysToolBench Physics Tool Task Dataset

Envision Multi-Stage Event Visual Generation Dataset

UNO-Bench full-modal Evaluation Benchmark Dataset

OpenGU Graph Forgetting Comprehensive Evaluation Dataset

SSRB Semi-structured Data Natural Language Query Dataset

RoVid-X Robot Video Generation Dataset

FrontierScience Inference Research Task Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

WideSearch Information Gathering Benchmark Dataset

*This dataset supports online use.Click here to jump.

Data Fields:

Related Datasets

DeepPlanning Long-Term Planning Capability Assessment Dataset

IF-Bench Infrared Image Understanding Benchmark Dataset

PhysToolBench Physics Tool Task Dataset

Envision Multi-Stage Event Visual Generation Dataset

UNO-Bench full-modal Evaluation Benchmark Dataset

OpenGU Graph Forgetting Comprehensive Evaluation Dataset

SSRB Semi-structured Data Natural Language Query Dataset

RoVid-X Robot Video Generation Dataset

FrontierScience Inference Research Task Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

DeepPlanning Long-Term Planning Capability Assessment Dataset

IF-Bench Infrared Image Understanding Benchmark Dataset

PhysToolBench Physics Tool Task Dataset

Envision Multi-Stage Event Visual Generation Dataset

UNO-Bench full-modal Evaluation Benchmark Dataset

OpenGU Graph Forgetting Comprehensive Evaluation Dataset

SSRB Semi-structured Data Natural Language Query Dataset

RoVid-X Robot Video Generation Dataset

FrontierScience Inference Research Task Evaluation Dataset

Related Datasets

DeepPlanning Long-Term Planning Capability Assessment Dataset

IF-Bench Infrared Image Understanding Benchmark Dataset

PhysToolBench Physics Tool Task Dataset

Envision Multi-Stage Event Visual Generation Dataset

UNO-Bench full-modal Evaluation Benchmark Dataset

OpenGU Graph Forgetting Comprehensive Evaluation Dataset

SSRB Semi-structured Data Natural Language Query Dataset

RoVid-X Robot Video Generation Dataset

FrontierScience Inference Research Task Evaluation Dataset