Video Based Generative Performance 5
"Video-based Generative Performance Benchmarking (Consistency)" is a benchmarking task designed to evaluate the consistency of generative video dialogue models. This task is based on the ActivityNet-200 dataset, which constructs the test set through rich dense descriptive captions and human-annotated question-answer pairs. An evaluation pipeline developed using the GPT-3.5 model is utilized to provide a relative score of 1-5 for the generated predictions. The aim is to measure the model's ability to maintain information consistency and logical coherence across multiple rounds of dialogue, providing crucial references for optimizing the performance of video dialogue systems.