HyperAI

Zero Shot Video Retrieval On Msr Vtt Full

Metriken

text-to-video R@1
text-to-video R@10
text-to-video R@5
video-to-text R@1
video-to-text R@10
video-to-text R@5

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
text-to-video R@1
text-to-video R@10
text-to-video R@5
video-to-text R@1
video-to-text R@10
video-to-text R@5
Paper TitleRepository
InternVL-C44.778.468.240.274.163.1InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
InternVL-G46.379.670.542.475.465.9InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
VideoCoCa34.367.057.864.791.485.2VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners-
0 of 3 row(s) selected.