Temporal Action Localization On Activitynet

Metrics

mAP

mAP IOU@0.5

mAP IOU@0.75

mAP IOU@0.95

Results

Performance results of various models on this benchmark

Model Name	mAP	mAP IOU@0.5	mAP IOU@0.75	mAP IOU@0.95	Paper Title	Repository
AdaTAD (VideoMAEv2-giant)	41.93	61.72	43.35	10.85	End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
BSN++	34.88	51.27	35.70	8.33	BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation
PRN (CSN)	39.4	57.9	-	-	Proposal Relation Network for Temporal Action Detection
TSP	35.81	51.26	37.12	9.29	TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
GCM	34.24	51.03	35.17	7.44	Graph Convolutional Module for Temporal Action Localization in Videos	-
BSN	30.03	46.45	29.96	8.02	BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
TAGS (I3D)	36.5	-	-	-	Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning
TadTR (TSP features)	36.75	53.62	37.52	10.56	End-to-end Temporal Action Detection with Transformer
SSN	32.26	39.12	-	-	A Pursuit of Temporal Accuracy in General Activity Detection
BC-GNN	34.26	50.56	34.75	9.37	Boundary Content Graph Neural Network for Temporal Action Proposal Generation	-
E2E-TAD (SlowFast R50+TadTR)	35.10	50.47	35.99	10.83	An Empirical Study of End-to-End Temporal Action Detection
InternVideo2-6B	41.2	-	-	-	InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
HCN(I3D features)	35.61	52.51	36.10	7.12	Improve Temporal Action Proposals using Hierarchical Context	-
LoFi+G-TAD	34.96	50.91	35.86	8.79	Low-Fidelity Video Encoder Optimization for Temporal Action Localization	-
UniMD+Sync.	39.83	60.29	-	-	UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
RDFA-S6 (InternVideo2-6B)	42.9	64.1	44.0	10.6	Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism
ActionMamba (InternVideo2-6B)	42.02	62.43	43.49	10.23	Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
AVFusion	36.82	54.34	37.66	8.93	Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization
InternVideo	39.00	-	-	-	InternVideo: General Video Foundation Models via Generative and Discriminative Learning
VSGN (TSP features)	35.94	53.26	36.76	8.12	Video Self-Stitching Graph Network for Temporal Action Localization

0 of 33 row(s) selected.