Text-Based Person Search with Limited Data

Text-based person search (TBPS) aims at retrieving a target person from animage gallery with a descriptive text query. Solving such a fine-grainedcross-modal retrieval task is challenging, which is further hampered by thelack of large-scale datasets. In this paper, we present a framework with twonovel components to handle the problems brought by limited data. Firstly, tofully utilize the existing small-scale benchmarking datasets for morediscriminative feature learning, we introduce a cross-modal momentumcontrastive learning framework to enrich the training data for a givenmini-batch. Secondly, we propose to transfer knowledge learned from existingcoarse-grained large-scale datasets containing image-text pairs fromdrastically different problem domains to compensate for the lack of TBPStraining data. A transfer learning method is designed so that usefulinformation can be transferred despite the large domain gap. Armed with thesecomponents, our method achieves new state of the art on the CUHK-PEDES datasetwith significant improvements over the prior art in terms of Rank-1 and mAP.Our code is available at https://github.com/BrandonHanx/TextReID.