Temporal Recurrent Networks for Online Action Detection

Most work on temporal action detection is formulated as an offline problem,in which the start and end times of actions are determined after the entirevideo is fully observed. However, important real-time applications includingsurveillance and driver assistance systems require identifying actions as soonas each video frame arrives, based only on current and historical observations.In this paper, we propose a novel framework, Temporal Recurrent Network (TRN),to model greater temporal context of a video frame by simultaneously performingonline action detection and anticipation of the immediate future. At eachmoment in time, our approach makes use of both accumulated historical evidenceand predicted future information to better recognize the action that iscurrently occurring, and integrates both of these into a unified end-to-endarchitecture. We evaluate our approach on two popular online action detectiondatasets, HDD and TVSeries, as well as another widely used dataset, THUMOS'14.The results show that TRN significantly outperforms the state-of-the-art.