HyperAIHyperAI
2 months ago

Long Short-Term Transformer for Online Action Detection

Xu, Mingze ; Xiong, Yuanjun ; Chen, Hao ; Li, Xinyu ; Xia, Wei ; Tu, Zhuowen ; Soatto, Stefano
Long Short-Term Transformer for Online Action Detection
Abstract

We present Long Short-term TRansformer (LSTR), a temporal modeling algorithmfor online action detection, which employs a long- and short-term memorymechanism to model prolonged sequence data. It consists of an LSTR encoder thatdynamically leverages coarse-scale historical information from an extendedtemporal window (e.g., 2048 frames spanning of up to 8 minutes), together withan LSTR decoder that focuses on a short time window (e.g., 32 frames spanning 8seconds) to model the fine-scale characteristics of the data. Compared to priorwork, LSTR provides an effective and efficient method to model long videos withfewer heuristics, which is validated by extensive empirical analysis. LSTRachieves state-of-the-art performance on three standard online action detectionbenchmarks, THUMOS'14, TVSeries, and HACS Segment. Code has been made availableat: https://xumingze0308.github.io/projects/lstr