HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Aktionserkennung
Action Classification On Kinetics 400
Action Classification On Kinetics 400
Metriken
Acc@1
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Acc@1
Paper Title
OmniVec2
93.6
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
FTP-UniFormerV2-L/14
93.4
Enhancing Video Transformers for Action Understanding with VLM-aided Training
InternVideo2-6B
92.1
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
InternVideo2-1B
91.6
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
OmniVec
91.1
OmniVec: Learning robust representations with cross modal sharing
InternVideo
91.1
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
TubeViT-H (ImageNet-1k)
90.9
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
UMT-L (ViT-L/16)
90.6
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher (ViT-L)
90.6
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
TubeVit-L (ImageNet-1k)
90.2
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
UniFormerV2-L (ViT-L, 336)
90.0
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
VideoMAE V2-g (64x266x266)
90.0
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
MTV-H (WTS 60M)
89.9
Multiview Transformers for Video Recognition
TAdaFormer-L/14
89.9
Temporally-Adaptive Models for Efficient Video Understanding
EVA
89.7
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
AM/12 ViT-B Dinov2
89.6
AM Flow: Adapters for Temporal Processing in Action Recognition
ATM
89.4
What Can Simple Arithmetic Operations Do for Temporal Modeling?
CoCa (finetuned)
88.9
CoCa: Contrastive Captioners are Image-Text Foundation Models
ILA (ViT-L/14)
88.7
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
BIKE (CLIP ViT-L/14)
88.7
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
0 of 204 row(s) selected.
Previous
Next
Action Classification On Kinetics 400 | SOTA | HyperAI