HyperAI

Temporal Action Localization On Thumos14

Métriques

mAP IOU@0.1
mAP IOU@0.2
mAP IOU@0.3
mAP IOU@0.4
mAP IOU@0.5

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
mAP IOU@0.1
mAP IOU@0.2
mAP IOU@0.3
mAP IOU@0.4
mAP IOU@0.5
Paper TitleRepository
GCM72.570.966.560.851.9Graph Convolutional Module for Temporal Action Localization in Videos-
TemporalMaxer (I3D features)--82.878.971.8TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization
CDC--40.129.423.3CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
TAL-Net59.857.153.248.5 42.8Rethinking the Faster R-CNN Architecture for Temporal Action Localization-
DaoTAD--62.859.553.8RGB Stream Is Enough for Temporal Action Detection
InternVideo2-1B-----InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
DualDETR (I3D features)--82.978.070.4Dual DETRs for Multi-Label Temporal Action Detection-
DCAN (TSN features)--68.262.754.1DCAN: Improving Temporal Action Detection via Dual Context Aggregation
TURN-FL-16 + S-CNN5450.944.134.925.6TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals
RDFA-S6 (InternVideo2-6B)--88.784.678.2Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism-
MUSES--68.964.056.9Multi-shot Temporal Event Localization: a Benchmark
TadML(two-stream)--73.2969.7362.53TadML: A fast temporal action detection with Mechanics-MLP
TSP74.0272.2969.163.353.5TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
ReAct (TSN features)--69.265.057.1ReAct: Temporal Action Detection with Relational Queries
BasicTAD (160,6,192,R50-SlowOnly)--75.570.863.5BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
TadTR--74.869.160.1End-to-end Temporal Action Detection with Transformer
ActionFormer (VideoMAE V2-g features)--84.079.673.0VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
InternVideo2-6B-----InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
ActionFormer (InternVideo features)-----InternVideo: General Video Foundation Models via Generative and Discriminative Learning
AVFusion--70.164.957.1Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization
0 of 42 row(s) selected.