Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

Shot boundary detection (SBD) is an important component of many videoanalysis tasks, such as action recognition, video indexing, summarization andediting. Previous work typically used a combination of low-level features likecolor histograms, in conjunction with simple models such as SVMs. Instead, wepropose to learn shot detection end-to-end, from pixels to final shotboundaries. For training such a model, we rely on our insight that all shotboundaries are generated. Thus, we create a dataset with one million frames andautomatically generated transitions such as cuts, dissolves and fades. In orderto efficiently analyze hours of videos, we propose a Convolutional NeuralNetwork (CNN) which is fully convolutional in time, thus allowing to use alarge temporal context without the need to repeatedly processing frames. Withthis architecture our method obtains state-of-the-art results while running atan unprecedented speed of more than 120x real-time.