One-Shot Segmentation in Clutter

We tackle the problem of one-shot segmentation: finding and segmenting apreviously unseen object in a cluttered scene based on a single instructionexample. We propose a novel dataset, which we call $\textit{clutteredOmniglot}$. Using a baseline architecture combining a Siamese embedding fordetection with a U-net for segmentation we show that increasing levels ofclutter make the task progressively harder. Using oracle models with access tovarious amounts of ground-truth information, we evaluate different aspects ofthe problem and show that in this kind of visual search task, detection andsegmentation are two intertwined problems, the solution to each of which helpssolving the other. We therefore introduce $\textit{MaskNet}$, an improved modelthat attends to multiple candidate locations, generates segmentation proposalsto mask out background clutter and selects among the segmented objects. Ourfindings suggest that such image recognition models based on an iterativerefinement of object detection and foreground segmentation may provide a way todeal with highly cluttered scenes.