Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection

The recently proposed camouflaged object detection (COD) attempts to segmentobjects that are visually blended into their surroundings, which is extremelycomplex and difficult in real-world scenarios. Apart from high intrinsicsimilarity between the camouflaged objects and their background, the objectsare usually diverse in scale, fuzzy in appearance, and even severely occluded.To deal with these problems, we propose a mixed-scale triplet network,\textbf{ZoomNet}, which mimics the behavior of humans when observing vagueimages, i.e., zooming in and out. Specifically, our ZoomNet employs the zoomstrategy to learn the discriminative mixed-scale semantics by the designedscale integration unit and hierarchical mixed-scale unit, which fully exploresimperceptible clues between the candidate objects and background surroundings.Moreover, considering the uncertainty and ambiguity derived fromindistinguishable textures, we construct a simple yet effective regularizationconstraint, uncertainty-aware loss, to promote the model to accurately producepredictions with higher confidence in candidate regions. Without bells andwhistles, our proposed highly task-friendly model consistently surpasses theexisting 23 state-of-the-art methods on four public datasets. Besides, thesuperior performance over the recent cutting-edge models on the SOD task alsoverifies the effectiveness and generality of our model. The code will beavailable at \url{https://github.com/lartpang/ZoomNet}.