HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Self-Supervised Action Recognition
Self Supervised Action Recognition On Hmdb51
Self Supervised Action Recognition On Hmdb51
Metrics
Frozen
Pre-Training Dataset
Top-1 Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Frozen
Pre-Training Dataset
Top-1 Accuracy
Paper Title
MVD (ViT-B)
false
Kinetics400
79.7
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
M3Video
false
Kinetics400
78.0
Masked Motion Encoding for Self-Supervised Video Representation Learning
pBYOL
false
Kinetics400
75.0
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
SCE (R3D-50)
false
Kinetics400
74.7
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
VideoMAE
false
Kinetics400
73.3
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
BraVe:V-FA (TSM-50x2)
false
-
70.5
Broaden Your Views for Self-Supervised Video Learning
CVRL (R3D-152 2x; K600)
false
Kinetics600
69.9
Spatiotemporal Contrastive Video Representation Learning
XKD (ViT-B/112/16)
-
-
69
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
XDC
false
IG-Kinetics
68.9
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
CVRL (R3D-50; K600)
false
Kinetics600
68.0
Spatiotemporal Contrastive Video Representation Learning
CrissCross (AudioSet)
false
AudioSet
66.8
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
CVRL (R3D-50; K400)
false
Kinetics400
66.7
Spatiotemporal Contrastive Video Representation Learning
XDC
false
IG-Random
66.5
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
XKD-Modality-Agnostic (ViT-B/112/16)
-
-
65.9
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
VideoMS (ViT-B)
false
no extra data
65.8
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
RSPNet
false
Kinetics400
64.7
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
CrissCross (Kinetics400)
false
Kinetics400
64.7
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
AVID+CMA (Modified R2+1D-18 on Audioset)
false
Audioset (Video+Audio)
64.7
Audio-Visual Instance Discrimination with Cross-Modal Agreement
ELo
false
-
64.5
Evolving Losses for Unsupervised Video Representation Learning
AVID (Modified R2+1D-18 on Audioset)
false
Audioset (Video+Audio)
64.1
Audio-Visual Instance Discrimination with Cross-Modal Agreement
0 of 48 row(s) selected.
Previous
Next
Self Supervised Action Recognition On Hmdb51 | SOTA | HyperAI