HyperAIHyperAI
2 months ago

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation

Luo, Zhuoyan ; Wu, Yinghao ; Cheng, Tianheng ; Liu, Yong ; Xiao, Yicheng ; Wang, Hongfa ; Zhang, Xiao-Ping ; Yang, Yujiu
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized
  Referring Expression Segmentation
Abstract

The newly proposed Generalized Referring Expression Segmentation (GRES)amplifies the formulation of classic RES by involving complexmultiple/non-target scenarios. Recent approaches address GRES by directlyextending the well-adopted RES frameworks with object-existence identification.However, these approaches tend to encode multi-granularity object informationinto a single representation, which makes it difficult to precisely representcomprehensive objects of different granularity. Moreover, the simple binaryobject-existence identification across all referent scenarios fails to specifytheir inherent differences, incurring ambiguity in object understanding. Totackle the above issues, we propose a \textbf{Co}unting-Aware\textbf{H}ierarchical \textbf{D}ecoding framework (CoHD) for GRES. Bydecoupling the intricate referring semantics into different granularity with avisual-linguistic hierarchy, and dynamic aggregating it with intra- andinter-selection, CoHD boosts multi-granularity comprehension with thereciprocal benefit of the hierarchical nature. Furthermore, we incorporate thecounting ability by embodying multiple/single/non-target scenarios into count-and category-level supervision, facilitating comprehensive object perception.Experimental results on gRefCOCO, Ref-ZOM, R-RefCOCO, and RefCOCO benchmarksdemonstrate the effectiveness and rationality of CoHD which outperformsstate-of-the-art GRES methods by a remarkable margin. Code is available at\href{https://github.com/RobertLuo1/CoHD}{here}.

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation | Latest Papers | HyperAI