Action Recognition In Videos On Something 1

المقاييس

GFLOPs

Param.

Top 1 Accuracy

Top 5 Accuracy

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج	GFLOPs	Param.	Top 1 Accuracy	Top 5 Accuracy	Paper Title	Repository
VoV3D-L (16frames, from scratch, single)	9.3x6	5.8M	49.5	78.0	Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
ip-CSN-152 (IG-65M pretraining)	-	-	53.3	-	Video Classification with Channel-Separated Convolutional Networks
VideoMAE V2-g	-	-	68.7	91.9	VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
HF-TSN (ImageNet pretraining)	-	-	41.97	-	Hierarchical Feature Aggregation Networks for Video Action Recognition	-
ECO-Net (ImageNet pretrained)	-	-	46.4	-	ECO: Efficient Convolutional Network for Online Video Understanding
VoV3D-M (32frames, from scratch, single)	11.5x6	3.3M	49.8	78.0	Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
TPS	-	-	58.3	-	Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
VoV3D-L (32frames, Kinetics pretrained, single)	20.9x6	5.8M	54.59	82.30	Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
TAdaConvNeXtV2-B	-	-	60.7	-	Temporally-Adaptive Models for Efficient Video Understanding
TAdaFormer-L/14	-	-	63.7	-	Temporally-Adaptive Models for Efficient Video Understanding
MSMA (8+16frames)	-	-	57.9	-	Multi-scale Motion-Aware Module for Video Action Recognition	-
TSM (RGB + Flow)	-	-	50.7	-	TSM: Temporal Shift Module for Efficient Video Understanding
TCM (Ensemble)	-	-	57.2	-	Motion-driven Visual Tempo Learning for Video-based Action Recognition
MARS+RGB+Flow (16 frames, Kinetics pretrained)	-	-	40.4	-	MARS: Motion-Augmented RGB Stream for Action Recognition	-
EAN ResNet50 (single clip, center crop,8+16 ensemble, with sparse Transformer)	-	-	57.2	83.9	EAN: Event Adaptive Network for Enhanced Action Recognition
ResNet50 I3D (Kinetics pretrained)	-	-	48.6	-	Moments in Time Dataset: one million videos for event understanding
2-Stream TRN	-	-	42.01	-	Temporal Relational Reasoning in Videos
ir-CSN-101	-	-	48.4	-	Video Classification with Channel-Separated Convolutional Networks
SELFYNet-TSM-R50 (16 frames, ImageNet pretrained)	-	-	54.3	82.9	Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
CT-Net Ensemble (R50, 8+12+16+24)	-	-	56.6	-	CT-Net: Channel Tensorization Network for Video Classification

0 of 74 row(s) selected.