HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Aktionserkennung
Action Recognition In Videos On Ucf101
Action Recognition In Videos On Ucf101
Metriken
3-fold Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
3-fold Accuracy
Paper Title
FTP-UniFormerV2-L/14
99.7
Enhancing Video Transformers for Action Understanding with VLM-aided Training
OmniVec2
99.6
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
OmniVec
99.6
OmniVec: Learning robust representations with cross modal sharing
VideoMAE V2-g
99.6
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
BIKE
98.8
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
SMART
98.64
SMART Frame Selection for Action Recognition
OmniSource (SlowOnly-8x8-R101-RGB + I3D-Flow)
98.6
Omni-sourced Webly-supervised Learning for Video Recognition
PERF-Net (multi-distilled S3D)
98.6
PERF-Net: Pose Empowered RGB-Flow Net
ZeroI2V ViT-L/14
98.6
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
LGD-3D Two-stream
98.2
Learning Spatio-Temporal Representation with Local and Global Diffusion
Text4Vis
98.2
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Two-Stream I3D (Imagenet+Kinetics pre-training)
98.0
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Two-Stream I3D (Kinetics pre-training)
97.8
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
MARS+RGB+Flow (64 frames, Kinetics pretrained)
97.8
MARS: Motion-Augmented RGB Stream for Action Recognition
HATNet (32 frames)
97.8
Large Scale Holistic Video Understanding
BubbleNET
97.62
Bubblenet: A Disperse Recurrent Structure To Recognize Activities
BQN
97.6
Busy-Quiet Video Disentangling for Video Classification
D3D + D3D
97.6
D3D: Distilled 3D Networks for Video Action Recognition
CCS + TSN (ImageNet+Kinetics pretrained)
97.4
Cooperative Cross-Stream Network for Discriminative Action Representation
R[2+1]D-TwoStream (Kinetics pretrained)
97.3
A Closer Look at Spatiotemporal Convolutions for Action Recognition
0 of 90 row(s) selected.
Previous
Next