HyperAI

Video-based Generative Performance Benchmarking (Correctness of Information) is a benchmark for evaluating the information accuracy of generative video dialogue models. This task is based on the ActivityNet-200 dataset and constructs a test set using rich and dense descriptive captions as well as human-annotated question-answer pairs. The evaluation pipeline developed using the GPT-3.5 model assigns a relative score of 1-5 to the generated predictions, aiming to quantify the information correctness in video dialogues and provide a scientific basis for model optimization and performance improvement.

VideoInstruct

ST-LLM

HyperAI

VideoInstruct

ST-LLM

Command Palette

Video-based Generative Performance Benchmarking (Correctness of Information)

Command Palette

Video-based Generative Performance Benchmarking (Correctness of Information)

Command Palette

Video-based Generative Performance Benchmarking (Correctness of Information)