HyperAI

Action Classification On Kinetics 700

Metriken

Top-1 Accuracy
Top-5 Accuracy

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
Top-1 Accuracy
Top-5 Accuracy
Paper TitleRepository
SRTG r(2+1)d-3449.4373.23Learn to cycle: Time-consistent feature discovery for action recognition
MViTv2-B76.693.2MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
SRTG r3d-5053.5274.17Learn to cycle: Time-consistent feature discovery for action recognition
MoViNet-A163.5-MoViNets: Mobile Video Networks for Efficient Video Recognition
MoViNet-A266.7-MoViNets: Mobile Video Networks for Efficient Video Recognition
MoViNet-A368.0-MoViNets: Mobile Video Networks for Efficient Video Recognition
VidTr-M69.588.3VidTr: Video Transformer Without Convolutions-
InternVideo-T84.0-InternVideo: General Video Foundation Models via Generative and Discriminative Learning
EVA82.9%-EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
MoViNet-A470.7-MoViNets: Mobile Video Networks for Efficient Video Recognition
UniFormerV2-L82.796.2UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
MaskFeat (no extra data, MViT-L)80.495.7Masked Feature Prediction for Self-Supervised Visual Pre-Training
InternVideo2-1B85.4-InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
SRTG r3d-10156.4676.82Learn to cycle: Time-consistent feature discovery for action recognition
VidTr-L70.289VidTr: Video Transformer Without Convolutions-
SRTG r3d-3449.1572.68Learn to cycle: Time-consistent feature discovery for action recognition
mPLUG-280.494.9mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
UMT-L (ViT-L/16)83.696.7Unmasked Teacher: Towards Training-Efficient Video Foundation Models
CoVeR (JFT-3B)79.894.9Co-training Transformer with Videos and Images Improves Action Recognition-
MViTv2-L (ImageNet-21k pretrain)79.494.9MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
0 of 36 row(s) selected.