HyperAI초신경

홈 뉴스 연구 논문 튜토리얼 데이터셋 백과사전 SOTA LLM 모델 GPU 랭킹 컨퍼런스

한국어

HyperAI초신경

Zero Shot Video Retrieval On Youcook2

평가 지표

text-to-video Median Rank

text-to-video R@1

text-to-video R@10

text-to-video R@5

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	text-to-video Median Rank	text-to-video R@1	text-to-video R@10	text-to-video R@5	Paper Title	Repository
VAST, HowToCaption-finetuned	8	19.7	53.9	43.6	HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
VideoCOca	-	20.3	53.3	43.0	VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners	-
TACo	-	19.9	55.7	43.2	TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment	-
OmniVec2	-	26.1	70.8	54.1	OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning	-
VideoCLIP	-	22.7	63.1	50.4	VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
HowToCaption	15	13.4	44.1	33.1	HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
VATT-MBS	-	-	45.5	-	VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
MIL-NCE	-	15.1	51.2	38.0	End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Norton	-	24.2	64.1	51.9	Multi-granularity Correspondence Learning from Long-term Noisy Videos

0 of 9 row(s) selected.