Zero Shot Video Retrieval On Msr Vtt Full

text-to-video R@1

text-to-video R@10

text-to-video R@5

video-to-text R@1

video-to-text R@10

video-to-text R@5

평가 결과

이 벤치마크에서 각 모델의 성능 결과

							Paper Title
InternVL-G	46.3	79.6	70.5	42.4	75.4	65.9	InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
InternVL-C	44.7	78.4	68.2	40.2	74.1	63.1	InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
VideoCoCa	34.3	67.0	57.8	64.7	91.4	85.2	VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners

0 of 3 row(s) selected.