Temporal Action Localization On Crosstask
Metrics
Recall
Results
Performance results of various models on this benchmark
Model Name | Recall | Paper Title | Repository |
---|---|---|---|
Alayrac | 13.3 | Unsupervised Learning from Narrated Instruction Videos | - |
VideoCLIP | 47.3 | VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding | |
VLM | 46.5 | VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding | |
TACo | 42.5 | TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment | - |
Text-Video Embedding | 33.6 | HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips | |
Fully-supervised upper-bound | 31.6 | Cross-task weakly supervised learning from instructional videos | |
Zhukov | 22.4 | Cross-task weakly supervised learning from instructional videos |
0 of 7 row(s) selected.