HyperAI超神经

Self Supervised Action Recognition On Ucf101

评估指标

3-fold Accuracy
Frozen
Pre-Training Dataset

评测结果

各个模型在此基准测试上的表现结果

模型名称
3-fold Accuracy
Frozen
Pre-Training Dataset
Paper TitleRepository
3D RotNet (3D ResNet-18)62.9falseKinetics400Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction-
DPC (3D ResNet-18, Split 1)60.6falseUCF101Video Representation Learning by Dense Predictive Coding
XKD-Modality-Agnostic (ViT-B/112/16)93.4--XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
ViCC (S3D; R+F)90.5falseUCF101Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
CVRL (R3D-50; K400)92.2falseKinetics400Spatiotemporal Contrastive Video Representation Learning
3D Cubic Puzzles (3D ResNet-18)65.8falseKinetics400Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles-
AVID (Modified R2+1D-18 on Audioset)91.0falseAudioset (Audio+Video)Audio-Visual Instance Discrimination with Cross-Modal Agreement
CVRL (R3D-50; K600)93.4falseKinetics600Spatiotemporal Contrastive Video Representation Learning
CVRL (R3D-152 2x; K600)93.9falseKinetics600Spatiotemporal Contrastive Video Representation Learning
VideoGan (C3D)52.1falseUCF101Generating Videos with Scene Dynamics-
AVID+CMA (Modified R2+1D-18 on Kinetics)87.5falseKinetics400 (Audio+Video)Audio-Visual Instance Discrimination with Cross-Modal Agreement
RSPNet93.7falseKinetics400RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
ViCC (S3D; RGB)72.2trueUCF101Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
VideoMAE(no extra data)91.3falseno extra dataVideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training-
AVID+CMA (Modified R2+1D-18 on Audioset)91.5falseAudioset (Audio+Video)Audio-Visual Instance Discrimination with Cross-Modal Agreement
MVD (ViT-B)97.5falseKinetics400Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
CrissCross (Kinetics400)91.5falseKinetics400Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
CrissCross (Kinetics-Sound)88.3falseKinetics-SoundSelf-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
ViCC (S3D; RGB)88.8falseUCF101Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
ViCC (R2+1D; RGB)82.8falseUCF101Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
0 of 53 row(s) selected.