HyperAI

Video Based Generative Performance 3

"Video-based Generative Performance Benchmarking (Contextual Understanding)" is a benchmarking task designed to evaluate the performance of generative video dialogue models in contextual understanding. This task is based on the ActivityNet-200 dataset, constructing a test set with rich dense descriptive captions and human-annotated question-answer pairs. It uses the GPT-3.5 model to score the generated predictions, aiming to comprehensively measure the model's understanding of video content and its generative capabilities, thereby promoting the performance optimization and application development of video dialogue systems.