MUVR Multimodal Uncropped Video Retrieval Benchmark
MUVR is a benchmark dataset for multimodal uncropped video retrieval tasks, released in 2025 by Nanjing University of Aeronautics and Astronautics in collaboration with Nanjing University and Hong Kong Polytechnic University. Related papers include... MUVR: A Multi-Modal Untrimmed Video Retrieval Benchmark with Multi-Level Visual CorrespondenceIt has been selected for NeurIPS 2025 Datasets and Benchmarks, aiming to promote research on video retrieval in long video platform scenarios.
This dataset contains approximately 53,000 unedited videos, 1,050 multimodal queries, and 84,000 query-video matching relationships from Bilibili, covering various common video types such as news, travel, and dance. To clearly distinguish different levels of matching relationships, the dataset defines six visual correspondence levels (copy, event, scene, instance, action, and others), employing a one-to-many retrieval setting, where each query can correspond to multiple complete videos containing relevant content. Query formats include long text descriptions, video tag hints, and mask hints to express fine-grained retrieval needs.

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.