SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

In this paper, we introduce SoccerNet, a benchmark for action spotting insoccer videos. The dataset is composed of 500 complete soccer games from sixmain European leagues, covering three seasons from 2014 to 2017 and a totalduration of 764 hours. A total of 6,637 temporal annotations are automaticallyparsed from online match reports at a one minute resolution for three mainclasses of events (Goal, Yellow/Red Card, and Substitution). As such, thedataset is easily scalable. These annotations are manually refined to a onesecond resolution by anchoring them at a single timestamp followingwell-defined soccer rules. With an average of one event every 6.9 minutes, thisdataset focuses on the problem of localizing very sparse events within longvideos. We define the task of spotting as finding the anchors of soccer eventsin a video. Making use of recent developments in the realm of generic actionrecognition and detection in video, we provide strong baselines for detectingsoccer events. We show that our best model for classifying temporal segments oflength one minute reaches a mean Average Precision (mAP) of 67.8%. For thespotting task, our baseline reaches an Average-mAP of 49.7% for tolerances$\delta$ ranging from 5 to 60 seconds. Our dataset and models are available athttps://silviogiancola.github.io/SoccerNet.