Video Based Generative Performance 2
Video-based Generative Performance Benchmarking (Detail Orientation) is a benchmark for evaluating the detail-oriented capabilities of generative video dialogue models. This task is based on the ActivityNet-200 dataset, constructing a test set from rich and dense human-annotated descriptive captions and their associated question-answer pairs. It uses the GPT-3.5 model to develop an evaluation pipeline that provides relative scores of 1-5 for generated predictions. The aim is to enhance the accuracy and coherence of models in understanding and expressing details, providing a crucial reference for optimizing the performance of video dialogue systems.