HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Action Segmentation
Action Segmentation On Coin
Action Segmentation On Coin
Metrics
Frame accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Frame accuracy
Paper Title
Repository
Norton
69.8
Multi-granularity Correspondence Learning from Long-term Noisy Videos
VLM
68.4
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
CBT
53.9
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
MIL-NCE
61.0
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
VideoClip
68.7
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
UnLoc-L
72.8
UnLoc: A Unified Framework for Video Localization Tasks
TACo
68.4
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
-
ActBERT
57.0
ActBERT: Learning Global-Local Video-Text Representations
-
Univl
70.0
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
0 of 9 row(s) selected.
Previous
Next