TadTr (I3D RGB) | 32.09 | 47.14 | 32.11 | 10.94 | End-to-end Temporal Action Detection with Transformer | |
ActionMamba(InternVideo2-6B) | 44.56 | 64.02 | 45.71 | 13.34 | Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | |
LoFi+G-TAD (RGB, RN18) | 24.64 | 37.78 | 24.40 | 7.29 | Low-Fidelity Video Encoder Optimization for Temporal Action Localization | - |