HyperAIHyperAI

Command Palette

Search for a command to run...

Image Segmentation Using Text and Image Prompts

Lüddecke Timo ; Ecker Alexander S.

Abstract

Image segmentation is usually addressed by training a model for a fixed setof object classes. Incorporating additional classes or more complex querieslater is expensive as it requires re-training the model on a dataset thatencompasses these expressions. Here we propose a system that can generate imagesegmentations based on arbitrary prompts at test time. A prompt can be either atext or an image. This approach enables us to create a unified model (trainedonce) for three common segmentation tasks, which come with distinct challenges:referring expression segmentation, zero-shot segmentation and one-shotsegmentation. We build upon the CLIP model as a backbone which we extend with atransformer-based decoder that enables dense prediction. After training on anextended version of the PhraseCut dataset, our system generates a binarysegmentation map for an image based on a free-text prompt or on an additionalimage expressing the query. We analyze different variants of the latterimage-based prompts in detail. This novel hybrid input allows for dynamicadaptation not only to the three segmentation tasks mentioned above, but to anybinary segmentation task where a text or image query can be formulated.Finally, we find our system to adapt well to generalized queries involvingaffordances or properties. Code is available athttps://eckerlab.org/code/clipseg.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Image Segmentation Using Text and Image Prompts | Papers | HyperAI