HyperAIHyperAI
2 months ago

Bi-directional Training for Composed Image Retrieval via Text Prompt Learning

Liu, Zheyuan ; Sun, Weixuan ; Hong, Yicong ; Teney, Damien ; Gould, Stephen
Bi-directional Training for Composed Image Retrieval via Text Prompt
  Learning
Abstract

Composed image retrieval searches for a target image based on a multi-modaluser query comprised of a reference image and modification text describing thedesired changes. Existing approaches to solving this challenging task learn amapping from the (reference image, modification text)-pair to an imageembedding that is then matched against a large image corpus. One area that hasnot yet been explored is the reverse direction, which asks the question, whatreference image when modified as described by the text would produce the giventarget image? In this work we propose a bi-directional training scheme thatleverages such reversed queries and can be applied to existing composed imageretrieval architectures with minimum changes, which improves the performance ofthe model. To encode the bi-directional query we prepend a learnable token tothe modification text that designates the direction of the query and thenfinetune the parameters of the text embedding module. We make no other changesto the network architecture. Experiments on two standard datasets show that ournovel approach achieves improved performance over a baseline BLIP-based modelthat itself already achieves competitive performance. Our code is released athttps://github.com/Cuberick-Orion/Bi-Blip4CIR.

Bi-directional Training for Composed Image Retrieval via Text Prompt Learning | Latest Papers | HyperAI