Zero Shot Video Retrieval On Vatex
評価指標
text-to-video R@1
text-to-video R@10
video-to-text R@1
video-to-text R@10
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
モデル名 | text-to-video R@1 | text-to-video R@10 | video-to-text R@1 | video-to-text R@10 | Paper Title | Repository |
---|---|---|---|---|---|---|
GRAM | 83.9 | 99.5 | 82.7 | 99 | Gramian Multimodal Representation Learning and Alignment | |
VideoCoCa | 53.2 | 90.1 | 73.6 | 97.2 | VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | - |
InternVideo2-6B | 71.5 | 97.1 | 85.3 | 99.3 | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | |
InternVideo | 49.5 | - | 69.5 | - | InternVideo: General Video Foundation Models via Generative and Discriminative Learning | |
InternVideo2-1B | 70.4 | 96.9 | 85.4 | 99.1 | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding |
0 of 5 row(s) selected.