HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Action Recognition
Action Recognition In Videos On Ucf101
Action Recognition In Videos On Ucf101
Metrics
3-fold Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
3-fold Accuracy
Paper Title
FTP-UniFormerV2-L/14
99.7
Enhancing Video Transformers for Action Understanding with VLM-aided Training
OmniVec2
99.6
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
OmniVec
99.6
OmniVec: Learning robust representations with cross modal sharing
VideoMAE V2-g
99.6
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
BIKE
98.8
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
SMART
98.64
SMART Frame Selection for Action Recognition
OmniSource (SlowOnly-8x8-R101-RGB + I3D-Flow)
98.6
Omni-sourced Webly-supervised Learning for Video Recognition
PERF-Net (multi-distilled S3D)
98.6
PERF-Net: Pose Empowered RGB-Flow Net
ZeroI2V ViT-L/14
98.6
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
LGD-3D Two-stream
98.2
Learning Spatio-Temporal Representation with Local and Global Diffusion
Text4Vis
98.2
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Two-Stream I3D (Imagenet+Kinetics pre-training)
98.0
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Two-Stream I3D (Kinetics pre-training)
97.8
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
MARS+RGB+Flow (64 frames, Kinetics pretrained)
97.8
MARS: Motion-Augmented RGB Stream for Action Recognition
HATNet (32 frames)
97.8
Large Scale Holistic Video Understanding
BubbleNET
97.62
Bubblenet: A Disperse Recurrent Structure To Recognize Activities
BQN
97.6
Busy-Quiet Video Disentangling for Video Classification
D3D + D3D
97.6
D3D: Distilled 3D Networks for Video Action Recognition
CCS + TSN (ImageNet+Kinetics pretrained)
97.4
Cooperative Cross-Stream Network for Discriminative Action Representation
R[2+1]D-TwoStream (Kinetics pretrained)
97.3
A Closer Look at Spatiotemporal Convolutions for Action Recognition
0 of 90 row(s) selected.
Previous
Next