HyperAIHyperAI
11 days ago

Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net

{Liang-Chieh Chen, Fangting Xia, Peng Wang, Alan L. Yuille}
Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net
Abstract

Parsing articulated objects, e.g. humans and animals, into semantic parts(e.g. body, head and arms, etc.) from natural images is a challenging andfundamental problem for computer vision. A big difficulty is the largevariability of scale and location for objects and their corresponding parts.Even limited mistakes in estimating scale and location will degrade the parsingoutput and cause errors in boundary details. To tackle these difficulties, wepropose a "Hierarchical Auto-Zoom Net" (HAZN) for object part parsing whichadapts to the local scales of objects and parts. HAZN is a sequence of two"Auto-Zoom Net" (AZNs), each employing fully convolutional networks thatperform two tasks: (1) predict the locations and scales of object instances(the first AZN) or their parts (the second AZN); (2) estimate the part scoresfor predicted object instance or part regions. Our model can adaptively "zoom"(resize) predicted image regions into their proper scales to refine theparsing. We conduct extensive experiments over the PASCAL part datasets on humans,horses, and cows. For humans, our approach significantly outperforms thestate-of-the-arts by 5% mIOU and is especially better at segmenting smallinstances and small parts. We obtain similar improvements for parsing cows andhorses over alternative methods. In summary, our strategy of first zooming intoobjects and then zooming into parts is very effective. It also enables us toprocess different regions of the image at different scales adaptively so that,for example, we do not need to waste computational resources scaling the entireimage.

Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net | Latest Papers | HyperAI