HyperAI

Action Recognition In Videos On Something 1

المقاييس

GFLOPs
Param.
Top 1 Accuracy
Top 5 Accuracy

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
GFLOPs
Param.
Top 1 Accuracy
Top 5 Accuracy
Paper TitleRepository
VoV3D-L (16frames, from scratch, single)9.3x65.8M49.578.0Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
ip-CSN-152 (IG-65M pretraining)--53.3-Video Classification with Channel-Separated Convolutional Networks
VideoMAE V2-g--68.791.9VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
HF-TSN (ImageNet pretraining)--41.97-Hierarchical Feature Aggregation Networks for Video Action Recognition-
ECO-Net (ImageNet pretrained)--46.4-ECO: Efficient Convolutional Network for Online Video Understanding
VoV3D-M (32frames, from scratch, single)11.5x63.3M49.878.0Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
TPS--58.3-Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
VoV3D-L (32frames, Kinetics pretrained, single)20.9x65.8M54.5982.30Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
TAdaConvNeXtV2-B--60.7-Temporally-Adaptive Models for Efficient Video Understanding
TAdaFormer-L/14--63.7-Temporally-Adaptive Models for Efficient Video Understanding
MSMA (8+16frames)--57.9-Multi-scale Motion-Aware Module for Video Action Recognition-
TSM (RGB + Flow)--50.7-TSM: Temporal Shift Module for Efficient Video Understanding
TCM (Ensemble)--57.2-Motion-driven Visual Tempo Learning for Video-based Action Recognition
MARS+RGB+Flow (16 frames, Kinetics pretrained)--40.4-MARS: Motion-Augmented RGB Stream for Action Recognition-
EAN ResNet50 (single clip, center crop,8+16 ensemble, with sparse Transformer)--57.283.9EAN: Event Adaptive Network for Enhanced Action Recognition
ResNet50 I3D (Kinetics pretrained)--48.6-Moments in Time Dataset: one million videos for event understanding
2-Stream TRN--42.01-Temporal Relational Reasoning in Videos
ir-CSN-101--48.4-Video Classification with Channel-Separated Convolutional Networks
SELFYNet-TSM-R50 (16 frames, ImageNet pretrained)--54.382.9Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
CT-Net Ensemble (R50, 8+12+16+24)--56.6-CT-Net: Channel Tensorization Network for Video Classification
0 of 74 row(s) selected.