Long Video Retrieval Background Removed On
평가 지표
Cap. Avg. R@1
Cap. Avg. R@10
Cap. Avg. R@5
DTW R@1
DTW R@10
DTW R@5
OTAM R@1
OTAM R@10
OTAM R@5
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | Cap. Avg. R@1 | Cap. Avg. R@10 | Cap. Avg. R@5 | DTW R@1 | DTW R@10 | DTW R@5 | OTAM R@1 | OTAM R@10 | OTAM R@5 | Paper Title | Repository |
---|---|---|---|---|---|---|---|---|---|---|---|
Norton | 75.5 | 97.7 | 95.0 | 88.7 | 99.5 | 98.8 | 88.9 | 99.5 | 98.4 | Multi-granularity Correspondence Learning from Long-term Noisy Videos | |
MCN | 53.4 | 81.4 | 75.0 | - | - | - | - | - | - | Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos | |
VideoCLIP | 74.5 | 97.9 | 94.5 | 56.0 | 89.9 | 96.3 | 52.8 | 89.2 | 95.0 | VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding | |
MIL-NCE | 43.1 | 79.1 | 68.6 | - | - | - | - | - | - | End-to-End Learning of Visual Representations from Uncurated Instructional Videos | |
TempCLR | 74.5 | 97.0 | 94.6 | 83.5 | 99.3 | 97.2 | 84.9 | 99.5 | 97.9 | TempCLR: Temporal Alignment Representation with Contrastive Learning | |
Text-Video Embedding | 46.6 | 83.7 | 74.3 | - | - | - | - | - | - | HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips |
0 of 6 row(s) selected.