HyperAIHyperAI

Command Palette

Search for a command to run...

Console

VideoRewardBench Video Reward Model Evaluation Dataset

Discuss on Discord

Date

16 hours ago

Organization

University of Science and Technology of China

Paper URL

2509.00484

License

MIT

VideoRewardBench, jointly developed by the University of Science and Technology of China and Huawei Noah's Ark Lab, is the first comprehensive evaluation benchmark in 2025 that fully covers four core dimensions of video understanding: perception, knowledge, reasoning, and security. Related research papers include... VideoRewardBench: Comprehensive Evaluation of Multimodal Reward Models for Video UnderstandingThe aim is to systematically evaluate the model's ability to make preference judgments and quality assessments of generated results in complex video understanding scenarios.

The dataset contains 1,563 labeled samples, involving 1,482 different videos and 1,559 different questions. Each sample consists of a video-text prompt, a preferred response, and a rejected response.

Dataset distribution:

Distributed by task dimension, the dataset covers five core evaluation dimensions, and the overall distribution is relatively balanced.

  • Long-form perception: 283 groups (18.1%)
  • Short-form perception: 413 groups (26.4%)
  • Knowledge: 238 sets (15.2%)
  • Reasoning: 278 groups (17.8%)
  • Safety: 351 sets (22.5%)

Based on the distribution of video duration, the videos are predominantly short in length:

  • ≤ 1 minute: 59.9%
  • 1–5 minutes: 33.21 TP3T
  • > 5 minutes: 6.9%

Statistics by text

  • Average question length: 28.8 words
  • Average response length: 103.8 words
  • Average length of preferred/rejected responses: 102.9 / 104.6 words

The similar length distribution of preferred and rejected answers indicates that preference labeling is primarily determined by answer quality rather than text length differences.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp