Segmentation is All You Need

Region proposal mechanisms are essential for existing deep learningapproaches to object detection in images. Although they can generally achieve agood detection performance under normal circumstances, their recall in a scenewith extreme cases is unacceptably low. This is mainly because bounding boxannotations contain much environment noise information, and non-maximumsuppression (NMS) is required to select target boxes. Therefore, in this paper,we propose the first anchor-free and NMS-free object detection model calledweakly supervised multimodal annotation segmentation (WSMA-Seg), which utilizessegmentation models to achieve an accurate and robust object detection withoutNMS. In WSMA-Seg, multimodal annotations are proposed to achieve aninstance-aware segmentation using weakly supervised bounding boxes; we alsodevelop a run-data-based following algorithm to trace contours of objects. Inaddition, we propose a multi-scale pooling segmentation (MSP-Seg) as theunderlying segmentation model of WSMA-Seg to achieve a more accuratesegmentation and to enhance the detection accuracy of WSMA-Seg. Experimentalresults on multiple datasets show that the proposed WSMA-Seg approachoutperforms the state-of-the-art detectors.