HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
自监督动作识别
Self Supervised Action Recognition On Ucf101
Self Supervised Action Recognition On Ucf101
评估指标
3-fold Accuracy
Frozen
Pre-Training Dataset
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
3-fold Accuracy
Frozen
Pre-Training Dataset
Paper Title
VideoMAE V2-g
99.6
-
-
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
MVD (ViT-B)
97.5
false
Kinetics400
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
SSL-KD (R21D-18)
97.3
false
Kinetics400
A Large-Scale Analysis on Self-Supervised Video Representation Learning
M3Video
96.5
false
Kinetics400
Masked Motion Encoding for Self-Supervised Video Representation Learning
pBYOL
96.3
false
Kinetics400
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
VideoMAE
96.1
false
Kinetics400
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
SCE (R3D-50)
95.3
false
Kinetics400
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
MMV TSM-50x2
95.2
false
Audioset + Howto100M
Self-Supervised MultiModal Versatile Networks
XKD (ViT-B/112/16)
94.1
-
Kinetics400
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
CVRL (R3D-152 2x; K600)
93.9
false
Kinetics600
Spatiotemporal Contrastive Video Representation Learning
RSPNet
93.7
false
Kinetics400
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
XKD-Modality-Agnostic (ViT-B/112/16)
93.4
-
-
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
CVRL (R3D-50; K600)
93.4
false
Kinetics600
Spatiotemporal Contrastive Video Representation Learning
VideoMS (ViT-B)
93.4
false
no extra data
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
BraVe:V-FA (TSM-50x2)
93.1
false
-
Broaden Your Views for Self-Supervised Video Learning
CrissCross (AudioSet)
92.4
false
AudioSet
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
CVRL (R3D-50; K400)
92.2
false
Kinetics400
Spatiotemporal Contrastive Video Representation Learning
AVID+CMA (Modified R2+1D-18 on Audioset)
91.5
false
Audioset (Audio+Video)
Audio-Visual Instance Discrimination with Cross-Modal Agreement
CrissCross (Kinetics400)
91.5
false
Kinetics400
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
VideoMAE(no extra data)
91.3
false
no extra data
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
0 of 53 row(s) selected.
Previous
Next