8 months ago

Multimodal Representation

Multi-Task Learning

Method/Architecture

Xiao Han Sen He Li Zhang Tao Xiang

Abstract

Text-based person search (TBPS) aims at retrieving a target person from animage gallery with a descriptive text query. Solving such a fine-grainedcross-modal retrieval task is challenging, which is further hampered by thelack of large-scale datasets. In this paper, we present a framework with twonovel components to handle the problems brought by limited data. Firstly, tofully utilize the existing small-scale benchmarking datasets for morediscriminative feature learning, we introduce a cross-modal momentumcontrastive learning framework to enrich the training data for a givenmini-batch. Secondly, we propose to transfer knowledge learned from existingcoarse-grained large-scale datasets containing image-text pairs fromdrastically different problem domains to compensate for the lack of TBPStraining data. A transfer learning method is designed so that usefulinformation can be transferred despite the large domain gap. Armed with thesecomponents, our method achieves new state of the art on the CUHK-PEDES datasetwith significant improvements over the prior art in terms of Rank-1 and mAP.Our code is available at https://github.com/BrandonHanx/TextReID.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Multimodal Representation

Multi-Task Learning

Method/Architecture

Xiao Han Sen He Li Zhang Tao Xiang

Abstract

Text-based person search (TBPS) aims at retrieving a target person from animage gallery with a descriptive text query. Solving such a fine-grainedcross-modal retrieval task is challenging, which is further hampered by thelack of large-scale datasets. In this paper, we present a framework with twonovel components to handle the problems brought by limited data. Firstly, tofully utilize the existing small-scale benchmarking datasets for morediscriminative feature learning, we introduce a cross-modal momentumcontrastive learning framework to enrich the training data for a givenmini-batch. Secondly, we propose to transfer knowledge learned from existingcoarse-grained large-scale datasets containing image-text pairs fromdrastically different problem domains to compensate for the lack of TBPStraining data. A transfer learning method is designed so that usefulinformation can be transferred despite the large domain gap. Armed with thesecomponents, our method achieves new state of the art on the CUHK-PEDES datasetwith significant improvements over the prior art in terms of Rank-1 and mAP.Our code is available at https://github.com/BrandonHanx/TextReID.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp