Precise Detection in Densely Packed Scenes

Man-made scenes can be densely packed, containing numerous objects, oftenidentical, positioned in close proximity. We show that precise object detectionin such scenes remains a challenging frontier even for state-of-the-art objectdetectors. We propose a novel, deep-learning based method for precise objectdetection, designed for such challenging settings. Our contributions include:(1) A layer for estimating the Jaccard index as a detection quality score; (2)a novel EM merging unit, which uses our quality scores to resolve detectionoverlap ambiguities; finally, (3) an extensive, annotated data set, SKU-110K,representing packed retail environments, released for training and testingunder such extreme settings. Detection tests on SKU-110K and counting tests onthe CARPK and PUCPR+ show our method to outperform existing state-of-the-artwith substantial margins. The code and data will be made available on\url{www.github.com/eg4000/SKU110K_CVPR19}.