Single-Image Crowd Counting via Multi-Column Convolutional Neural Network

This paper aims to develop a method than can accuratelyestimate the crowd count from an individual image with arbitrary crowd density and arbitrary perspective. To this end,we have proposed a simple but effective Multi-column Convolutional Neural Network (MCNN) architecture to map theimage to its crowd density map. The proposed MCNN allows the input image to be of arbitrary size or resolution.By utilizing filters with receptive fields of different sizes, thefeatures learned by each column CNN are adaptive to variations in people/head size due to perspective effect or imageresolution. Furthermore, the true density map is computed accurately based on geometry-adaptive kernels which donot need knowing the perspective map of the input image. Since exiting crowd counting datasets do not adequately cover all the challenging situations considered in our work,we have collected and labelled a large new dataset thatincludes 1198 images with about 330,000 heads annotated. On this challenging new dataset, as well as all existingdatasets, we conduct extensive experiments to verify the effectiveness of the proposed model and method. In particular, with the proposed simple MCNN model, our methodoutperforms all existing methods. In addition, experimentsshow that our model, once trained on one dataset, can bereadily transferred to a new dataset.