Action Recognition In Videos
动作识别是计算机视觉领域的一项任务,旨在通过视频或图像识别和分类人类行为。其目标是将视频或图像中执行的动作归类到预定义的动作类别中,以实现准确的动作检测与理解。该任务对于视频监控、人机交互、体育分析等应用场景具有重要价值。然而,构建大规模视频数据集的挑战使得现有的动作识别基准测试大多规模较小,通常只有约10k个视频。
ActionNet-VE
ActivityNet
Text4Vis (w/ ViT-L)
Animal Kingdom
AVA v2.1
AVA v2.2
LART (Hiera-H, K700 PT+FT)
BAR
Charades
Charades-Ego
LaViLa (Finetuned, TimeSformer-L)
Diving-48
Drone-Action
DVS128 Gesture
EgoGesture
EPIC-KITCHENS-55
EPIC-KITCHENS-100
Avion (ViT-L)
H2O (2 Hands and Objects)
HandFormer-B/21x8
HAA500
HACS
UniFormerV2-L
HMDB-51
VideoMAE V2-g
HMDB51
MSQNet
Hockey
ICVL-4
IndustReal
IRD
Jester (Gesture Recognition)
DirecFormer
KTH
CNN-GRU
MECCANO
SlowFast
Mimetics
JMRN
miniSports
MTL-AQA
C3D-AVG
N-UCLA
DVANet
NEC Drone
NTU RGB+D
PoseC3D (RGB + Pose)
NTU RGB+D 120
PoseC3D (RGB + Pose)
Okutama-Action
Penn Action
RareAct
Real Life Violence Situations Dataset
DeVTr
RoCoG-v2
Skeleton-Mimetics
SL-Animals
SEW-Resnet18 (3sets)
Something-Something V1
InternVideo
Something-Something V2
MVD (Kinetics400 pretrain, ViT-H, 16 frame)
Sports-1M
ip-CSN-152 (RGB)
THUMOS’14
BMN
THUMOS14
UAV-Human
PMI Sampler
UAV Human
FAR
UCF-101
R3D-18
UCF 101
R2+1D-BERT
UCF101
VideoMAE V2-g
UCFSports
UTD-MHAD
VIRAT Ground 2.0
Volleyball
PoseC3D (Pose Only)
Win-Fail Action Understanding
2DCNN+TRN