HyperAI

Referring Image Segmentation

Referring Image Segmentation (RIS) aims to segment the target objects referred to by natural language expressions. However, previous methods rely on a strong assumption that a sentence must describe an object in an image, which is usually not the case in real applications. Therefore, such methods fail when the expression does not refer to any object or refers to multiple objects.

The goal of referential image segmentation is to segment referents through a natural language expression. Due to the different data properties between text and images, it is difficult for the network to align text and pixel-level features well.