Video Retrieval On Youcook2
평가 지표
text-to-video Median Rank
text-to-video R@1
text-to-video R@10
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | text-to-video Median Rank | text-to-video R@1 | text-to-video R@10 |
---|---|---|---|
coot-cooperative-hierarchical-transformer-for | 9 | 16.7 | 52.3 |
howto100m-learning-a-text-video-embedding-by | 24 | 8.2 | 35.3 |
associating-neural-word-embeddings-with-deep | 75 | 4.6 | 21.6 |
semantic-role-aware-correlation-transformer | 77 | 5.3 | 20.8 |
rome-role-aware-mixture-of-expert-transformer | 53 | 6.3 | 25.2 |
videoclip-contrastive-pre-training-for-zero | - | 32.2 | 75.0 |
vlm-task-agnostic-video-language-model-pre | 4 | 27.05 | 69.38 |
taco-token-aware-cascade-contrastive-learning | 4 | 29.6 | 72.7 |
vast-a-vision-audio-subtitle-text-omni-1 | - | 50.4 | 80.8 |
omnivec-learning-robust-representations-with | - | - | 64.2 |
meltr-meta-loss-transformer-for-learning-to | 3 | 33.7 | 74.8 |
omnivec-learning-robust-representations-with | - | - | 70.8 |
univilm-a-unified-video-and-language-pre | 4 | 28.9 | 70.0 |
videoclip-contrastive-pre-training-for-zero | - | 22.7 | 63.1 |
video-text-modeling-with-zero-shot-transfer | - | 21.7 | 55.2 |
mdmmt-2-multidomain-multimodal-transformer | 3.0 | 32.0 | 74.8 |