HyperAI

Zero Shot Cross Modal Retrieval

Zero-Shot Cross-Modal Retrieval is a task aimed at finding relevant items across different modalities (such as text and images) without training examples. The main challenge of this task is the heterogeneity gap, which refers to the inherent differences in data types across modalities, making it difficult to directly measure similarity. To address this issue, existing methods typically bridge the heterogeneity gap by learning a shared latent representation space, allowing data from different modalities to be projected into the same representation space, thereby enabling direct similarity measurement between cross-modal items. This technology has significant application value in areas such as e-commerce.