HyperAI

Action Recognition In Videos On Something

Metriken

Top-1 Accuracy
Top-5 Accuracy

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
Top-1 Accuracy
Top-5 Accuracy
Paper TitleRepository
TRG (ResNet-50)62.290.3Temporal Reasoning Graph for Activity Recognition-
MVFNet-ResNet50 (center crop, 8+16 ensemble, ImageNet pretrained, RGB only)66.3-MVFNet: Multi-View Fusion Network for Efficient Video Recognition-
Mformer-L68.191.2Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
ViC-MAE (ViT-L)73.7-ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
bLVNet65.2-More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation
TPS69.8-Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
VideoMAE (no extra data, ViT-B, 16frame)70.892.4VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training-
MSNet-R50En (8+16 ensemble, ImageNet pretrained)66.690.6MotionSqueeze: Neural Motion Feature Learning for Video Understanding
TAda2D (ResNet-50, 8 frames)64.088.0TAda! Temporally-Adaptive Convolutions for Video Understanding
VideoMAE (no extra data, ViT-L, 32x2)75.495.2VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training-
PAN ResNet101 (RGB only, no Flow)66.590.6PAN: Towards Fast Action Recognition via Learning Persistence of Appearance
MoViNet-A061.388.2MoViNets: Mobile Video Networks for Efficient Video Recognition
Mformer-HR67.190.6Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
ORViT Mformer (ORViT blocks)67.990.5Object-Region Video Transformers
SVT59.2-Self-supervised Video Transformer
STM + TRNMultiscale47.73-Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos
TimeSformer-L62.3-Is Space-Time Attention All You Need for Video Understanding?
MML (single)66.8391.30Mutual Modality Learning for Video Action Classification
AMD(ViT-S/16)70.292.5Asymmetric Masked Distillation for Pre-Training Small Foundation Models-
UniFormer-B (IN-1K + Kinetics400 pretrain)71.292.8UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning
0 of 122 row(s) selected.