HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
Action Recognition In Videos
Action Recognition On Diving 48
Action Recognition On Diving 48
المقاييس
Accuracy
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
Accuracy
Paper Title
Repository
ORViT TimeSformer
88.0
Object-Region Video Transformers
VIMPAC
85.5
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
SlowFast
77.6
SlowFast Networks for Video Recognition
LVMAE
94.9
Extending Video Masked Autoencoders to 128 frames
-
StructVit-B-4-1
88.3
Learning Correlation Structures for Vision Transformers
-
TimeSformer
75
Is Space-Time Attention All You Need for Video Understanding?
DUALPATH
88.7
Dual-path Adaptation from Image to Video Transformers
TimeSformer-HR
78
Is Space-Time Attention All You Need for Video Understanding?
TFCNet
88.3
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning
-
Video-FocalNet-B
90.8
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
AIM (CLIP ViT-L/14, 32x224)
90.6
AIM: Adapting Image Models for Efficient Video Action Recognition
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
84.2
Relational Self-Attention: What's Missing in Attention for Video Understanding
GC-TDN
87.6
Group Contextualization for Video Recognition
BEVT
86.7
BEVT: BERT Pretraining of Video Transformers
PMI Sampler
81.3
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition
TQN
81.8
Temporal Query Networks for Fine-grained Video Understanding
-
TimeSformer-L
81
Is Space-Time Attention All You Need for Video Understanding?
PSB
86
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
0 of 18 row(s) selected.
Previous
Next