HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Video Retrieval
Video Retrieval On Vatex
Video Retrieval On Vatex
평가 지표
text-to-video R@1
text-to-video R@10
text-to-video R@5
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
text-to-video R@1
text-to-video R@10
text-to-video R@5
Paper Title
Repository
VAST
83.0
99.2
98.2
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
QB-Norm+CLIP2Video
58.8
93.8
-
Cross Modal Retrieval with Querybank Normalisation
CLIP2Video
57.3
90
-
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
Side4Video
68.8
97.0
93.5
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
VALOR
78.5
98.7
97.1
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Cap4Video
66.6
97.0
93.1
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
InternVideo2-6B
75.5
-
-
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
GRAM
87.7
100
-
Gramian Multimodal Representation Learning and Alignment
TS2-Net
59.1
95.2
-
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
LAFF
59.1
91.7
-
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
Unmasked Teacher
72
97.8
95.1
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
InternVideo
71.1
-
-
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
TeachCLIP
63.6
96.1
91.9
Holistic Features are almost Sufficient for Text-to-Video Retrieval
0 of 13 row(s) selected.
Previous
Next