Zero Shot Video Retrieval On Vatex
المقاييس
text-to-video R@1
text-to-video R@10
video-to-text R@1
video-to-text R@10
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
| Paper Title | |||||
|---|---|---|---|---|---|
| GRAM | 83.9 | 99.5 | 82.7 | 99 | Gramian Multimodal Representation Learning and Alignment |
| InternVideo2-6B | 71.5 | 97.1 | 85.3 | 99.3 | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding |
| InternVideo2-1B | 70.4 | 96.9 | 85.4 | 99.1 | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding |
| VideoCoCa | 53.2 | 90.1 | 73.6 | 97.2 | VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners |
| InternVideo | 49.5 | - | 69.5 | - | InternVideo: General Video Foundation Models via Generative and Discriminative Learning |
0 of 5 row(s) selected.