HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Action Recognition In Videos
Action Recognition On Diving 48
Action Recognition On Diving 48
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
Repository
ORViT TimeSformer
88.0
Object-Region Video Transformers
VIMPAC
85.5
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
SlowFast
77.6
SlowFast Networks for Video Recognition
LVMAE
94.9
Extending Video Masked Autoencoders to 128 frames
-
StructVit-B-4-1
88.3
Learning Correlation Structures for Vision Transformers
-
TimeSformer
75
Is Space-Time Attention All You Need for Video Understanding?
DUALPATH
88.7
Dual-path Adaptation from Image to Video Transformers
TimeSformer-HR
78
Is Space-Time Attention All You Need for Video Understanding?
TFCNet
88.3
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning
-
Video-FocalNet-B
90.8
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
AIM (CLIP ViT-L/14, 32x224)
90.6
AIM: Adapting Image Models for Efficient Video Action Recognition
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
84.2
Relational Self-Attention: What's Missing in Attention for Video Understanding
GC-TDN
87.6
Group Contextualization for Video Recognition
BEVT
86.7
BEVT: BERT Pretraining of Video Transformers
PMI Sampler
81.3
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition
TQN
81.8
Temporal Query Networks for Fine-grained Video Understanding
-
TimeSformer-L
81
Is Space-Time Attention All You Need for Video Understanding?
PSB
86
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
0 of 18 row(s) selected.
Previous
Next