4 months ago

Convolutional Neural Network

Video Processing

Action Recognition

Method/Architecture

Computer Vision

Gygli Michael

Abstract

Shot boundary detection (SBD) is an important component of many videoanalysis tasks, such as action recognition, video indexing, summarization andediting. Previous work typically used a combination of low-level features likecolor histograms, in conjunction with simple models such as SVMs. Instead, wepropose to learn shot detection end-to-end, from pixels to final shotboundaries. For training such a model, we rely on our insight that all shotboundaries are generated. Thus, we create a dataset with one million frames andautomatically generated transitions such as cuts, dissolves and fades. In orderto efficiently analyze hours of videos, we propose a Convolutional NeuralNetwork (CNN) which is fully convolutional in time, thus allowing to use alarge temporal context without the need to repeatedly processing frames. Withthis architecture our method obtains state-of-the-art results while running atan unprecedented speed of more than 120x real-time.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

4 months ago

Convolutional Neural Network

Video Processing

Action Recognition

Method/Architecture

Computer Vision

Gygli Michael

Abstract

Shot boundary detection (SBD) is an important component of many videoanalysis tasks, such as action recognition, video indexing, summarization andediting. Previous work typically used a combination of low-level features likecolor histograms, in conjunction with simple models such as SVMs. Instead, wepropose to learn shot detection end-to-end, from pixels to final shotboundaries. For training such a model, we rely on our insight that all shotboundaries are generated. Thus, we create a dataset with one million frames andautomatically generated transitions such as cuts, dissolves and fades. In orderto efficiently analyze hours of videos, we propose a Convolutional NeuralNetwork (CNN) which is fully convolutional in time, thus allowing to use alarge temporal context without the need to repeatedly processing frames. Withthis architecture our method obtains state-of-the-art results while running atan unprecedented speed of more than 120x real-time.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp