Date

6 months ago

Organization

Paper URL

LuMGoG6lBA

License

Apache 2.0

Tags

LLM

Natural Language Processing

Retrieval-Augmented Generation

SSRB is a large-scale benchmark dataset for natural language querying of semi-structured data, released in 2025 by Harbin Institute of Technology (Shenzhen) in collaboration with Hong Kong Polytechnic University, Tsinghua University, and other institutions. Related research papers include... SSRB: Direct Natural Language Querying to Massive Heterogeneous Semi-Structured DataIt has been selected for NeurIPS 2025 Datasets and Benchmarks, which aims to evaluate and promote the model's ability to retrieve semi-structured data under complex natural language query conditions. This dataset contains approximately 14 million semi-structured data objects and 8,485 test queries, covering six different domains and involving 99 different patterns. Each query in the dataset addresses the retrieval requirements of semi-structured data. Query conditions typically combine precise field matching constraints with fuzzy semantic matching requirements, and may involve multiple fields and implicit inference. It is used to systematically evaluate the model's ability to retrieve and understand semi-structured data under complex query conditions.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Discuss on Discord

Date

6 months ago

Organization

Paper URL

LuMGoG6lBA

License

Apache 2.0

Related Datasets

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

SSRB Semi-structured Data Natural Language Query Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

SSRB Semi-structured Data Natural Language Query Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

CHIMERA General Inference Synthetic Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

CCTV Incident Fall Detection Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

SSRB Semi-structured Data Natural Language Query Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

CHIMERA General Inference Synthetic Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

CCTV Incident Fall Detection Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

CHIMERA General Inference Synthetic Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

CCTV Incident Fall Detection Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

CHIMERA General Inference Synthetic Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

CCTV Incident Fall Detection Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset