HyperAIHyperAI

M3-Bench Long Video Question Answering Benchmark Dataset

Date

a month ago

Organization

ByteDance Seed

Publish URL

huggingface.co

Paper URL

2508.09736

License

非商业用途

Download Help

*This dataset supports online use.Click here to jump.

M3-Bench is a long video question answering benchmark dataset released by ByteDance Seed Team in 2025. The related paper results are "Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory", which aims to evaluate the long-term memory and reasoning ability of multimodal intelligent agents.

The dataset contains 1,020 video samples, each of which includes captions, intermediate outputs, and memory maps. M3-Bench uses long video open-ended question answering (VQA) as its core task, with each video accompanied by a set of open-ended questions.

Data composition:

  • M3-Bench-robot: 100 new first-person videos of real-world scenarios (from the robot's perspective) recorded by the research team
  • M3-Bench-web: 920 long videos from the internet, covering a wider range of content and scenarios