HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Action Recognition In Videos
Action Recognition On Diving 48
Action Recognition On Diving 48
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
ORViT TimeSformer
88.0
Object-Region Video Transformers
VIMPAC
85.5
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
SlowFast
77.6
SlowFast Networks for Video Recognition
LVMAE
94.9
Extending Video Masked Autoencoders to 128 frames
-
StructVit-B-4-1
88.3
Learning Correlation Structures for Vision Transformers
-
TimeSformer
75
Is Space-Time Attention All You Need for Video Understanding?
DUALPATH
88.7
Dual-path Adaptation from Image to Video Transformers
TimeSformer-HR
78
Is Space-Time Attention All You Need for Video Understanding?
TFCNet
88.3
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning
-
Video-FocalNet-B
90.8
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
AIM (CLIP ViT-L/14, 32x224)
90.6
AIM: Adapting Image Models for Efficient Video Action Recognition
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
84.2
Relational Self-Attention: What's Missing in Attention for Video Understanding
GC-TDN
87.6
Group Contextualization for Video Recognition
BEVT
86.7
BEVT: BERT Pretraining of Video Transformers
PMI Sampler
81.3
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition
TQN
81.8
Temporal Query Networks for Fine-grained Video Understanding
-
TimeSformer-L
81
Is Space-Time Attention All You Need for Video Understanding?
PSB
86
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
0 of 18 row(s) selected.
Previous
Next