HyperAI
HyperAI超神経
ホーム
プラットフォーム
ドキュメント
ニュース
論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
利用規約
プライバシーポリシー
日本語
HyperAI
HyperAI超神経
Toggle Sidebar
サイトを検索…
⌘
K
Command Palette
Search for a command to run...
プラットフォーム
ホーム
SOTA
アクション分類
Action Classification On Charades
Action Classification On Charades
評価指標
MAP
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
MAP
Paper Title
TokenLearner
66.3
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TubeViT-L
66.2
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
MoViNet-A6
63.2
MoViNets: Mobile Video Networks for Efficient Video Recognition
DEEP-HAL with ODF+SDF (AssembleNet++)
62.29
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors
AssembleNet++ 50
59.8
AssembleNet++: Assembling Modality Representations via Attention Connections
AssembleNet-101
58.6
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
AssembleNet
58.6
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
VicTR (ViT-L/14)
57.6
VicTR: Video-conditioned Text Representations for Activity Recognition
AssembleNet++ 50 without object
54.98
AssembleNet++: Assembling Modality Representations via Attention Connections
BIKE
50.7
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
DEEP-HAL with ODF+SDF (I3D)
50.16
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors
MoViNet-A4
48.5
MoViNets: Mobile Video Networks for Efficient Video Recognition
AdaFocus (weak supervision, MViT-B-24, 32x3)
47.8
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
MViT-B-24, 32x3 (Kinetics-600 pretraining)
47.7
Multiscale Vision Transformers
En-VidTr-L
47.3
VidTr: Video Transformer Without Convolutions
MViT-B, 32x3 (Kinetics-600 pretraining)
47.1
Multiscale Vision Transformers
MViT-B-24, 32x3 (Kinetics-400 pretraining)
46.3
Multiscale Vision Transformers
SlowFast (Kinetics-600 pretraining, NL)
45.2
SlowFast Networks for Video Recognition
ActionCLIP (ViT-B/16)
44.3
ActionCLIP: A New Paradigm for Video Action Recognition
MViT-B, 32x3 (Kinetics-400 pretraining)
44.3
Multiscale Vision Transformers
0 of 49 row(s) selected.
Previous
Next
Action Classification On Charades | SOTA | HyperAI超神経