Startseite Neuigkeiten Forschungsarbeiten Tutorials Datensätze Wiki SOTA LLM-Modelle GPU-Rangliste Veranstaltungen

Deutsch

Long Video Retrieval Background Removed On

Metriken

Cap. Avg. R@1

Cap. Avg. R@10

Cap. Avg. R@5

DTW R@1

DTW R@10

DTW R@5

OTAM R@1

OTAM R@10

OTAM R@5

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname	Cap. Avg. R@1	Cap. Avg. R@10	Cap. Avg. R@5	DTW R@1	DTW R@10	DTW R@5	OTAM R@1	OTAM R@10	OTAM R@5	Paper Title	Repository
Norton	75.5	97.7	95.0	88.7	99.5	98.8	88.9	99.5	98.4	Multi-granularity Correspondence Learning from Long-term Noisy Videos
MCN	53.4	81.4	75.0	-	-	-	-	-	-	Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
VideoCLIP	74.5	97.9	94.5	56.0	89.9	96.3	52.8	89.2	95.0	VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
MIL-NCE	43.1	79.1	68.6	-	-	-	-	-	-	End-to-End Learning of Visual Representations from Uncurated Instructional Videos
TempCLR	74.5	97.0	94.6	83.5	99.3	97.2	84.9	99.5	97.9	TempCLR: Temporal Alignment Representation with Contrastive Learning
Text-Video Embedding	46.6	83.7	74.3	-	-	-	-	-	-	HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips

0 of 6 row(s) selected.