Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation

Referring Expression Segmentation (RES) has attracted rising attention,aiming to identify and segment objects based on natural language expressions.While substantial progress has been made in RES, the emergence of GeneralizedReferring Expression Segmentation (GRES) introduces new challenges by allowingexpressions to describe multiple objects or lack specific object references.Existing RES methods, usually rely on sophisticated encoder-decoder and featurefusion modules, and are difficult to generate class prototypes that match eachinstance individually when confronted with the complex referent and binarylabels of GRES. In this paper, reevaluating the differences between RES andGRES, we propose a novel Model with Adaptive Binding Prototypes (MABP) thatadaptively binds queries to object features in the corresponding region. Itenables different query vectors to match instances of different categories ordifferent parts of the same instance, significantly expanding the decoder'sflexibility, dispersing global pressure across all queries, and easing thedemands on the encoder. Experimental results demonstrate that MABPsignificantly outperforms state-of-the-art methods in all three splits ongRefCOCO dataset. Meanwhile, MABP also surpasses state-of-the-art methods onRefCOCO+ and G-Ref datasets, and achieves very competitive results on RefCOCO.Code is available at https://github.com/buptLwz/MABP