RMS-Net: Regression and Masking for Soccer Event Spotting

The recently proposed action spotting task consists in finding the exacttimestamp in which an event occurs. This task fits particularly well for soccervideos, where events correspond to salient actions strictly defined by soccerrules (a goal occurs when the ball crosses the goal line). In this paper, wedevise a lightweight and modular network for action spotting, which cansimultaneously predict the event label and its temporal offset using the sameunderlying features. We enrich our model with two training strategies: thefirst one for data balancing and uniform sampling, the second for maskingambiguous frames and keeping the most discriminative visual cues. When testedon the SoccerNet dataset and using standard features, our full proposal exceedsthe current state of the art by 3 Average-mAP points. Additionally, it reachesa gain of more than 10 Average-mAP points on the test set when fine-tuned incombination with a strong 2D backbone.