7 months ago

Abstract

Cross-modal matching has recently gained significant popularity to facilitate retrieval across multi-modal data, and existing works are highly relied on an implicit assumption that the training data pairs are perfectly aligned. However, such an ideal assumption is extremely impossible due to the inevitably mismatched data pairs, a.k.a. noisy correspondence, which can wrongly enforce the mismatched data to be similar and thus induces the performance degradation. Although some recent methods have attempted to address this problem, they still face two challenging issues: 1) un- reliable data division for training inefficiency and 2) unstable pre- diction for matching failure. To address these problems, we pro- pose an efficient Uncertainty-Guided Noisy Correspondence Learning (UGNCL) framework to achieve noise-robust cross-modal matching. Specifically, a novel Uncertainty Guided Division (UGD) algorithm is reliably designed leverage the potential benefits of derived un- certainty to divide the data into clean, noisy and hard partitions, which can effortlessly mitigate the impact of easily-determined noisy pairs. Meanwhile, an efficient Trusted Robust Loss (TRL) is explicitly designed to recast the soft margins, calibrated by confi- dent yet error soft correspondence labels, for the data pairs in the hard partition through the uncertainty, leading to increase/decrease the importance of matched/mismatched pairs and further alleviate the impact of noisy pairs for robustness improvement. Extensive experiments conducted on three public datasets highlight the su- periorities of the proposed framework, and show its competitive performance compared with the state-of-the-arts. The code is avail- able at https://github.com/qxzha/UGNCL.

Source PDF View Code