Video-Adverb Retrieval (Unseen Compositions) | SOTA | HyperAI

The Video-Adverb Retrieval (Unseen Compositions) task aims to identify adverb-action combinations in videos that were not seen during the training phase. This task involves analyzing video content to go beyond known adverb-action patterns and capture new semantic relationships, thereby enhancing the generalization and depth of understanding of computer vision systems. Its application value lies in improving the intelligence levels in areas such as video retrieval, behavior analysis, and natural language processing.

MSR-VTT Adverbs

ActivityNet Adverbs