HyperAI

Video-based Generative Performance Benchmarking is an evaluation benchmark task designed to comprehensively assess the generative performance of video dialogue models through five key aspects: information accuracy, detail orientation, context understanding, temporal understanding, and consistency. This task constructs a test set based on the ActivityNet-200 dataset, which includes rich and densely described videos along with associated human-annotated question-answer pairs. It also utilizes the GPT-3.5 model to develop a scoring pipeline that provides relative scores from 1 to 5 for the generated predictions. This benchmark task helps advance the development and optimization of video dialogue models, enhancing their performance in real-world applications.

VideoInstruct

PLLaVA-34B

HyperAI

VideoInstruct

PLLaVA-34B

Command Palette

Video-based Generative Performance Benchmarking

Command Palette

Video-based Generative Performance Benchmarking

Command Palette

Video-based Generative Performance Benchmarking