Image Text Matching
Image-text matching is a subtask of cross-modal retrieval (CMR) that aims to establish associations between images and their corresponding textual descriptions. Its goal is to retrieve relevant images given a text query, or to retrieve corresponding text descriptions given an image query. This task is challenging due to the heterogeneity gap between the representation of image and text data, and it is widely applied in content-based image search, visual question answering, and multimodal summarization scenarios.