8 months ago

Multimodal Representation

Haochen Han Kaiyao Miao Qinghua Zheng Minnan Luo

Abstract

Despite the success of multimodal learning in cross-modal retrieval task, theremarkable progress relies on the correct correspondence among multimedia data.However, collecting such ideal data is expensive and time-consuming. Inpractice, most widely used datasets are harvested from the Internet andinevitably contain mismatched pairs. Training on such noisy correspondencedatasets causes performance degradation because the cross-modal retrievalmethods can wrongly enforce the mismatched data to be similar. To tackle thisproblem, we propose a Meta Similarity Correction Network (MSCN) to providereliable similarity scores. We view a binary classification task as themeta-process that encourages the MSCN to learn discrimination from positive andnegative meta-data. To further alleviate the influence of noise, we design aneffective data purification strategy using meta-data as prior knowledge toremove the noisy samples. Extensive experiments are conducted to demonstratethe strengths of our method in both synthetic and real-world noises, includingFlickr30K, MS-COCO, and Conceptual Captions.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Multimodal Representation

Haochen Han Kaiyao Miao Qinghua Zheng Minnan Luo

Abstract

Despite the success of multimodal learning in cross-modal retrieval task, theremarkable progress relies on the correct correspondence among multimedia data.However, collecting such ideal data is expensive and time-consuming. Inpractice, most widely used datasets are harvested from the Internet andinevitably contain mismatched pairs. Training on such noisy correspondencedatasets causes performance degradation because the cross-modal retrievalmethods can wrongly enforce the mismatched data to be similar. To tackle thisproblem, we propose a Meta Similarity Correction Network (MSCN) to providereliable similarity scores. We view a binary classification task as themeta-process that encourages the MSCN to learn discrimination from positive andnegative meta-data. To further alleviate the influence of noise, we design aneffective data purification strategy using meta-data as prior knowledge toremove the noisy samples. Extensive experiments are conducted to demonstratethe strengths of our method in both synthetic and real-world noises, includingFlickr30K, MS-COCO, and Conceptual Captions.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

Noisy Correspondence Learning with Meta Similarity Correction | Papers | HyperAI