HyperAIHyperAI

Command Palette

Search for a command to run...

AutoCaption Video Caption Benchmark Dataset

Date

3 months ago

Paper URL

arxiv.org

License

Apache 2.0

Join the Discord Community

The AutoCaption dataset is a video caption benchmark dataset released by Tjunlp Lab in 2025. The related paper results are "Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search", which aims to promote the research of multimodal large language models in the field of video subtitle generation.

Dataset structure:

The dataset contains 2 subsets, with a total of 11,184 samples:

  • sft_data: supervised fine-tuning for subtitle models (9,419 samples for supervised fine-tuning data)
  • mcts_vcb: Evaluated using MCTS-generated captions and keypoints (1,765 samples for evaluating the MCTS-VCB benchmark)

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
AutoCaption Video Caption Benchmark Dataset | Datasets | HyperAI