A Dynamic Spatial-temporal Attention Network for Early Anticipation of Traffic Accidents

The rapid advancement of sensor technologies and artificial intelligence arecreating new opportunities for traffic safety enhancement. Dashboard cameras(dashcams) have been widely deployed on both human driving vehicles andautomated driving vehicles. A computational intelligence model that canaccurately and promptly predict accidents from the dashcam video will enhancethe preparedness for accident prevention. The spatial-temporal interaction oftraffic agents is complex. Visual cues for predicting a future accident areembedded deeply in dashcam video data. Therefore, the early anticipation oftraffic accidents remains a challenge. Inspired by the attention behavior ofhumans in visually perceiving accident risks, this paper proposes a DynamicSpatial-Temporal Attention (DSTA) network for the early accident anticipationfrom dashcam videos. The DSTA-network learns to select discriminative temporalsegments of a video sequence with a Dynamic Temporal Attention (DTA) module. Italso learns to focus on the informative spatial regions of frames with aDynamic Spatial Attention (DSA) module. A Gated Recurrent Unit (GRU) is trainedjointly with the attention modules to predict the probability of a futureaccident. The evaluation of the DSTA-network on two benchmark datasets confirmsthat it has exceeded the state-of-the-art performance. A thorough ablationstudy that assesses the DSTA-network at the component level reveals how thenetwork achieves such performance. Furthermore, this paper proposes a method tofuse the prediction scores from two complementary models and verifies itseffectiveness in further boosting the performance of early accidentanticipation.