Zero Shot Video Retrieval On Msr Vtt Full
평가 지표
text-to-video R@1
text-to-video R@10
text-to-video R@5
video-to-text R@1
video-to-text R@10
video-to-text R@5
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | text-to-video R@1 | text-to-video R@10 | text-to-video R@5 | video-to-text R@1 | video-to-text R@10 | video-to-text R@5 | Paper Title | Repository |
---|---|---|---|---|---|---|---|---|
InternVL-C | 44.7 | 78.4 | 68.2 | 40.2 | 74.1 | 63.1 | InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | |
InternVL-G | 46.3 | 79.6 | 70.5 | 42.4 | 75.4 | 65.9 | InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | |
VideoCoCa | 34.3 | 67.0 | 57.8 | 64.7 | 91.4 | 85.2 | VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | - |
0 of 3 row(s) selected.