AdaTAD (VideoMAEv2-giant) | 41.93 | 61.72 | 43.35 | 10.85 | End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames | |
TadTR (TSP features) | 36.75 | 53.62 | 37.52 | 10.56 | End-to-end Temporal Action Detection with Transformer | |
E2E-TAD (SlowFast R50+TadTR) | 35.10 | 50.47 | 35.99 | 10.83 | An Empirical Study of End-to-End Temporal Action Detection | |
HCN(I3D features) | 35.61 | 52.51 | 36.10 | 7.12 | Improve Temporal Action Proposals using Hierarchical Context | - |
ActionMamba (InternVideo2-6B) | 42.02 | 62.43 | 43.49 | 10.23 | Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | |
VSGN (TSP features) | 35.94 | 53.26 | 36.76 | 8.12 | Video Self-Stitching Graph Network for Temporal Action Localization | |