HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
자기지도 행동인식
Self Supervised Action Recognition On Ucf101
Self Supervised Action Recognition On Ucf101
평가 지표
3-fold Accuracy
Frozen
Pre-Training Dataset
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
3-fold Accuracy
Frozen
Pre-Training Dataset
Paper Title
VideoMAE V2-g
99.6
-
-
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
MVD (ViT-B)
97.5
false
Kinetics400
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
SSL-KD (R21D-18)
97.3
false
Kinetics400
A Large-Scale Analysis on Self-Supervised Video Representation Learning
M3Video
96.5
false
Kinetics400
Masked Motion Encoding for Self-Supervised Video Representation Learning
pBYOL
96.3
false
Kinetics400
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
VideoMAE
96.1
false
Kinetics400
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
SCE (R3D-50)
95.3
false
Kinetics400
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
MMV TSM-50x2
95.2
false
Audioset + Howto100M
Self-Supervised MultiModal Versatile Networks
XKD (ViT-B/112/16)
94.1
-
Kinetics400
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
CVRL (R3D-152 2x; K600)
93.9
false
Kinetics600
Spatiotemporal Contrastive Video Representation Learning
RSPNet
93.7
false
Kinetics400
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
XKD-Modality-Agnostic (ViT-B/112/16)
93.4
-
-
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
CVRL (R3D-50; K600)
93.4
false
Kinetics600
Spatiotemporal Contrastive Video Representation Learning
VideoMS (ViT-B)
93.4
false
no extra data
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
BraVe:V-FA (TSM-50x2)
93.1
false
-
Broaden Your Views for Self-Supervised Video Learning
CrissCross (AudioSet)
92.4
false
AudioSet
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
CVRL (R3D-50; K400)
92.2
false
Kinetics400
Spatiotemporal Contrastive Video Representation Learning
AVID+CMA (Modified R2+1D-18 on Audioset)
91.5
false
Audioset (Audio+Video)
Audio-Visual Instance Discrimination with Cross-Modal Agreement
CrissCross (Kinetics400)
91.5
false
Kinetics400
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
VideoMAE(no extra data)
91.3
false
no extra data
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
0 of 53 row(s) selected.
Previous
Next