HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
비디오 검색
Video Retrieval On Youcook2
Video Retrieval On Youcook2
평가 지표
text-to-video Median Rank
text-to-video R@1
text-to-video R@10
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
text-to-video Median Rank
text-to-video R@1
text-to-video R@10
Paper Title
VAST
-
50.4
80.8
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
VideoCLIP
-
32.2
75.0
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
UniVL + MELTR
3
33.7
74.8
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
MDMMT-2
3.0
32.0
74.8
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
TACo
4
29.6
72.7
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
OmniVec
-
-
70.8
OmniVec: Learning robust representations with cross modal sharing
UniVL
4
28.9
70.0
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
VLM
4
27.05
69.38
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
OmniVec (pretrained)
-
-
64.2
OmniVec: Learning robust representations with cross modal sharing
VideoCLIP (zero-shot)
-
22.7
63.1
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
VideoCoCa (zero-shot)
-
21.7
55.2
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
COOT
9
16.7
52.3
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Text-Video Embedding
24
8.2
35.3
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
RoME
53
6.3
25.2
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval
HGLMM FV CCA
75
4.6
21.6
Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vectors
Satar et al.
77
5.3
20.8
Semantic Role Aware Correlation Transformer for Text to Video Retrieval
0 of 16 row(s) selected.
Previous
Next
Video Retrieval On Youcook2 | SOTA | HyperAI초신경