HyperAI

Main

GPU

Console
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
SOTA
Action Recognition

Action Recognition

Action recognition is a task in the field of computer vision that aims to identify and classify human behaviors through videos or images. Its goal is to categorize the actions being performed in videos or images into predefined action categories, thereby achieving accurate action detection and understanding. This task holds significant value for applications such as video surveillance, human-computer interaction, and sports analysis. However, the challenge of building large-scale video datasets has led to most existing action recognition benchmarks being relatively small, typically containing only around 10k videos.

Something-Something V2

MSNet-R50En (8+16 ensemble, ImageNet pretrained)

Something-Something V1

VideoMAE (K700 pretrain+finetune, ViT-L, 16x4)

EPIC-KITCHENS-100

PoseC3D (RGB + Pose)

PoseC3D (RGB + Pose)

Text4Vis (w/ ViT-L)

H2O (2 Hands and Objects)

HandFormer-B/21x8

ip-CSN-152 (RGB)

LaViLa (Finetuned, TimeSformer-L)

PoseC3D (Pose Only)

Real Life Violence Situations Dataset

Jester (Gesture Recognition)

SEW-Resnet18 (3sets)

Win-Fail Action Understanding

VIRAT Ground 2.0

EPIC-KITCHENS-55

Skeleton-Mimetics

Build the Future of Artificial Intelligence

About

About Us Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
SOTA
Action Recognition

Action Recognition

Action recognition is a task in the field of computer vision that aims to identify and classify human behaviors through videos or images. Its goal is to categorize the actions being performed in videos or images into predefined action categories, thereby achieving accurate action detection and understanding. This task holds significant value for applications such as video surveillance, human-computer interaction, and sports analysis. However, the challenge of building large-scale video datasets has led to most existing action recognition benchmarks being relatively small, typically containing only around 10k videos.

Something-Something V2

MSNet-R50En (8+16 ensemble, ImageNet pretrained)

Something-Something V1

VideoMAE (K700 pretrain+finetune, ViT-L, 16x4)

EPIC-KITCHENS-100

PoseC3D (RGB + Pose)

PoseC3D (RGB + Pose)

Text4Vis (w/ ViT-L)

H2O (2 Hands and Objects)

HandFormer-B/21x8

ip-CSN-152 (RGB)

LaViLa (Finetuned, TimeSformer-L)

PoseC3D (Pose Only)

Real Life Violence Situations Dataset

Jester (Gesture Recognition)

SEW-Resnet18 (3sets)

Win-Fail Action Understanding

VIRAT Ground 2.0

EPIC-KITCHENS-55

Skeleton-Mimetics

Build the Future of Artificial Intelligence

About

About Us Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)