HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
تصنيف الإجراءات
Action Classification On Charades
Action Classification On Charades
المقاييس
MAP
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
MAP
Paper Title
TokenLearner
66.3
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TubeViT-L
66.2
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
MoViNet-A6
63.2
MoViNets: Mobile Video Networks for Efficient Video Recognition
DEEP-HAL with ODF+SDF (AssembleNet++)
62.29
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors
AssembleNet++ 50
59.8
AssembleNet++: Assembling Modality Representations via Attention Connections
AssembleNet-101
58.6
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
AssembleNet
58.6
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
VicTR (ViT-L/14)
57.6
VicTR: Video-conditioned Text Representations for Activity Recognition
AssembleNet++ 50 without object
54.98
AssembleNet++: Assembling Modality Representations via Attention Connections
BIKE
50.7
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
DEEP-HAL with ODF+SDF (I3D)
50.16
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors
MoViNet-A4
48.5
MoViNets: Mobile Video Networks for Efficient Video Recognition
AdaFocus (weak supervision, MViT-B-24, 32x3)
47.8
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
MViT-B-24, 32x3 (Kinetics-600 pretraining)
47.7
Multiscale Vision Transformers
En-VidTr-L
47.3
VidTr: Video Transformer Without Convolutions
MViT-B, 32x3 (Kinetics-600 pretraining)
47.1
Multiscale Vision Transformers
MViT-B-24, 32x3 (Kinetics-400 pretraining)
46.3
Multiscale Vision Transformers
SlowFast (Kinetics-600 pretraining, NL)
45.2
SlowFast Networks for Video Recognition
ActionCLIP (ViT-B/16)
44.3
ActionCLIP: A New Paradigm for Video Action Recognition
MViT-B, 32x3 (Kinetics-400 pretraining)
44.3
Multiscale Vision Transformers
0 of 49 row(s) selected.
Previous
Next
Action Classification On Charades | SOTA | HyperAI