Self Supervised Action Recognition On Hmdb51
Metrics
Frozen
Pre-Training Dataset
Top-1 Accuracy
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | Frozen | Pre-Training Dataset | Top-1 Accuracy |
---|---|---|---|
rspnet-relative-speed-perception-for | false | Kinetics400 | 64.7 |
videomae-masked-autoencoders-are-data-1 | false | Kinetics400 | 73.3 |
self-supervised-video-representation-using | false | UCF101 | 43.2 |
self-supervised-learning-by-cross-modal-audio | false | IG-Random | 66.5 |
self-supervised-spatio-temporal | false | UCF101 | 20.3 |
self-supervised-learning-by-cross-modal-audio | false | Kinetics400 | 52.6 |
self-supervised-spatiotemporal-learning-via | false | UCF101 | 29.5 |
self-supervised-learning-by-cross-modal-audio | false | AudioSet | 63.7 |
efficient-video-representation-learning-via | false | no extra data | 65.8 |
self-supervised-video-representation-learning-7 | false | UCF101 | 52.4 |
self-supervised-co-training-for-video | false | - | 46.1 |
slic-self-supervised-learning-with-iterative-1 | false | UCF101 | 54.5 |
self-supervised-video-representation-learning | false | Kinetics400 | 33.7 |
self-supervised-learning-by-cross-modal-audio | false | IG-Kinetics | 68.9 |
self-supervised-spatiotemporal-feature | false | Kinetics400 | 33.7 |
spatiotemporal-contrastive-video | false | Kinetics600 | 69.9 |
broaden-your-views-for-self-supervised-video | false | - | 70.5 |
xkd-cross-modal-knowledge-distillation-with | - | - | 65.9 |
shuffle-and-learn-unsupervised-learning-using | false | UCF101 | 19.8 |
video-representation-learning-by-dense | false | Kinetics400 | 34.5 |
self-supervised-video-representation-learning-8 | false | UCF101 | 54.8 |
spatiotemporal-contrastive-video | false | Kinetics400 | 66.7 |
evolving-losses-for-unsupervised-video | false | - | 64.5 |
video-cloze-procedure-for-self-supervised | false | UCF101 | 31.5 |
videomae-masked-autoencoders-are-data-1 | false | no extra data | 62.6 |
similarity-contrastive-estimation-for-image | false | Kinetics400 | 74.7 |
audio-visual-instance-discrimination-with | false | Kinetics400 (Video+Audio) | 59.9 |
self-supervised-video-representation-learning-7 | false | UCF101 | 62.2 |
temporally-coherent-embeddings-for-self | false | Kinetics400 | 34.2 |
tclr-temporal-contrastive-learning-for-video | false | UCF101 | 52.9 |
m-3-video-masked-motion-modeling-for-self | false | Kinetics400 | 78.0 |
self-supervised-video-representation-learning-7 | false | UCF101 | 61.5 |
audio-visual-instance-discrimination-with | false | Kinetics400 (Video+Audio) | 60.8 |
self-supervised-audio-visual-representation | false | AudioSet | 66.8 |
self-supervised-audio-visual-representation | false | Kinetics400 | 64.7 |
self-supervised-video-representation-learning-8 | false | UCF101 | 54.5 |
self-supervised-video-representation-learning-7 | true | UCF101 | 38.5 |
video-representation-learning-by-dense | false | Kinetics400 | 35.7 |
self-supervised-audio-visual-representation | false | Kinetics-Sound | 60.5 |
audio-visual-instance-discrimination-with | false | Audioset (Video+Audio) | 64.7 |
unsupervised-representation-learning-by | false | UCF101 | 23.8 |
self-supervised-video-representation-learning-3 | false | UCF101 | 38.3 |
xkd-cross-modal-knowledge-distillation-with | - | - | 69 |
a-large-scale-study-on-unsupervised | false | Kinetics400 | 75.0 |
temporally-coherent-embeddings-for-self | false | Kinetics400 | 36.6 |
spatiotemporal-contrastive-video | false | Kinetics600 | 68.0 |
audio-visual-instance-discrimination-with | false | Audioset (Video+Audio) | 64.1 |
masked-video-distillation-rethinking-masked | false | Kinetics400 | 79.7 |