HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
Action Classification
Action Classification On Moments In Time
Action Classification On Moments In Time
المقاييس
Top 1 Accuracy
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
Top 1 Accuracy
Paper Title
Repository
MoViNet-A5
39.1
MoViNets: Mobile Video Networks for Efficient Video Recognition
MoViNet-A4
37.9
MoViNets: Mobile Video Networks for Efficient Video Recognition
UMT-L (ViT-L/16)
48.7
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
I3D
29.51%
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
TSN-2Stream
-
Temporal Segment Networks for Action Recognition in Videos
SRTG r3d-34
28.55
Learn to cycle: Time-consistent feature discovery for action recognition
MoViNet-A0
27.5
MoViNets: Mobile Video Networks for Efficient Video Recognition
UniFormerV2-L
47.8
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
TRN-Multiscale
28.27
Temporal Relational Reasoning in Videos
SRTG r3d-101
33.56
Learn to cycle: Time-consistent feature discovery for action recognition
AssembleNet
34.27%
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
SRTG r3d-50
30.72
Learn to cycle: Time-consistent feature discovery for action recognition
CoVeR(JFT-3B)
46.1
Co-training Transformer with Videos and Images Improves Action Recognition
-
EvaNet
31.8%
Evolving Space-Time Neural Architectures for Videos
-
InternVideo2-1B
50.9
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
CoVeR(JFT-300M)
45.0
Co-training Transformer with Videos and Images Improves Action Recognition
-
CoST (ResNet-101, 32 frames)
32.4%
Collaborative Spatiotemporal Feature Learning for Video Action Recognition
OmniVec2
53.1
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
-
MBT (AV)
37.3
Attention Bottlenecks for Multimodal Fusion
ViViT-L/16x2
-
ViViT: A Video Vision Transformer
0 of 29 row(s) selected.
Previous
Next