HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Aktionserkennung
Action Recognition In Videos On Something
Action Recognition In Videos On Something
Metriken
Top-1 Accuracy
Top-5 Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Top-1 Accuracy
Top-5 Accuracy
Paper Title
MVD (Kinetics400 pretrain, ViT-H, 16 frame)
77.3
95.7
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
InternVideo
77.2
-
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
InternVideo2-1B
77.1
-
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
VideoMAE V2-g
77.0
95.9
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
MVD (Kinetics400 pretrain, ViT-L, 16 frame)
76.7
95.5
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Hiera-L (no extra data)
76.5
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
TubeViT-L
76.1
95.2
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
VideoMAE (no extra data, ViT-L, 32x2)
75.4
95.2
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Side4Video (EVA ViT-E/14)
75.2
94.0
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
MaskFeat (Kinetics600 pretrain, MViT-L)
75.0
95.0
Masked Feature Prediction for Self-Supervised Visual Pre-Training
MAR (50% mask, ViT-L, 16x4)
74.7
94.9
MAR: Masked Autoencoders for Efficient Action Recognition
ATM
74.6
94.4
What Can Simple Arithmetic Operations Do for Temporal Modeling?
MAWS (ViT-L)
74.4
-
The effectiveness of MAE pre-pretraining for billion-scale pretraining
VideoMAE (no extra data, ViT-L, 16frame)
74.3
94.6
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MAR (75% mask, ViT-L, 16x4)
73.8
94.4
MAR: Masked Autoencoders for Efficient Action Recognition
ViC-MAE (ViT-L)
73.7
-
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
MVD (Kinetics400 pretrain, ViT-B, 16 frame)
73.7
94.0
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
TAdaFormer-L/14
73.6
-
Temporally-Adaptive Models for Efficient Video Understanding
TDS-CLIP-ViT-L/14(8frames)
73.4
93.8
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning
AMD(ViT-B/16)
73.3
94.0
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
0 of 122 row(s) selected.
Previous
Next