A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

A unified deep neural network, denoted the multi-scale CNN (MS-CNN), isproposed for fast multi-scale object detection. The MS-CNN consists of aproposal sub-network and a detection sub-network. In the proposal sub-network,detection is performed at multiple output layers, so that receptive fieldsmatch objects of different scales. These complementary scale-specific detectorsare combined to produce a strong multi-scale object detector. The unifiednetwork is learned end-to-end, by optimizing a multi-task loss. Featureupsampling by deconvolution is also explored, as an alternative to inputupsampling, to reduce the memory and computation costs. State-of-the-art objectdetection performance, at up to 15 fps, is reported on datasets, such as KITTIand Caltech, containing a substantial number of small objects.