Temporal Action Localization On Thumos14

评估指标

mAP IOU@0.1

mAP IOU@0.2

mAP IOU@0.3

mAP IOU@0.4

mAP IOU@0.5

评测结果

各个模型在此基准测试上的表现结果

模型名称	mAP IOU@0.1	mAP IOU@0.2	mAP IOU@0.3	mAP IOU@0.4	mAP IOU@0.5	Paper Title	Repository
GCM	72.5	70.9	66.5	60.8	51.9	Graph Convolutional Module for Temporal Action Localization in Videos	-
TemporalMaxer (I3D features)	-	-	82.8	78.9	71.8	TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization
CDC	-	-	40.1	29.4	23.3	CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
TAL-Net	59.8	57.1	53.2	48.5	42.8	Rethinking the Faster R-CNN Architecture for Temporal Action Localization	-
DaoTAD	-	-	62.8	59.5	53.8	RGB Stream Is Enough for Temporal Action Detection
InternVideo2-1B	-	-	-	-	-	InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
DualDETR (I3D features)	-	-	82.9	78.0	70.4	Dual DETRs for Multi-Label Temporal Action Detection	-
DCAN (TSN features)	-	-	68.2	62.7	54.1	DCAN: Improving Temporal Action Detection via Dual Context Aggregation
TURN-FL-16 + S-CNN	54	50.9	44.1	34.9	25.6	TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals
RDFA-S6 (InternVideo2-6B)	-	-	88.7	84.6	78.2	Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism
MUSES	-	-	68.9	64.0	56.9	Multi-shot Temporal Event Localization: a Benchmark
TadML(two-stream)	-	-	73.29	69.73	62.53	TadML: A fast temporal action detection with Mechanics-MLP
TSP	74.02	72.29	69.1	63.3	53.5	TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
ReAct (TSN features)	-	-	69.2	65.0	57.1	ReAct: Temporal Action Detection with Relational Queries
BasicTAD (160,6,192,R50-SlowOnly)	-	-	75.5	70.8	63.5	BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
TadTR	-	-	74.8	69.1	60.1	End-to-end Temporal Action Detection with Transformer
ActionFormer (VideoMAE V2-g features)	-	-	84.0	79.6	73.0	VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
InternVideo2-6B	-	-	-	-	-	InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
ActionFormer (InternVideo features)	-	-	-	-	-	InternVideo: General Video Foundation Models via Generative and Discriminative Learning
AVFusion	-	-	70.1	64.9	57.1	Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization

0 of 42 row(s) selected.