8 months ago

Multimodal Representation

Computer Vision

Computer Vision

Giovanni Burbi Alberto Baldrati Lorenzo Agnolucci Marco Bertini Alberto Del Bimbo

Abstract

Multimodal image-text memes are prevalent on the internet, serving as aunique form of communication that combines visual and textual elements toconvey humor, ideas, or emotions. However, some memes take a malicious turn,promoting hateful content and perpetuating discrimination. Detecting hatefulmemes within this multimodal context is a challenging task that requiresunderstanding the intertwined meaning of text and images. In this work, weaddress this issue by proposing a novel approach named ISSUES for multimodalhateful meme classification. ISSUES leverages a pre-trained CLIPvision-language model and the textual inversion technique to effectivelycapture the multimodal semantic content of the memes. The experiments show thatour method achieves state-of-the-art results on the Hateful Memes Challenge andHarMeme datasets. The code and the pre-trained models are publicly available athttps://github.com/miccunifi/ISSUES.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Multimodal Representation

Computer Vision

Computer Vision

Giovanni Burbi Alberto Baldrati Lorenzo Agnolucci Marco Bertini Alberto Del Bimbo

Abstract

Multimodal image-text memes are prevalent on the internet, serving as aunique form of communication that combines visual and textual elements toconvey humor, ideas, or emotions. However, some memes take a malicious turn,promoting hateful content and perpetuating discrimination. Detecting hatefulmemes within this multimodal context is a challenging task that requiresunderstanding the intertwined meaning of text and images. In this work, weaddress this issue by proposing a novel approach named ISSUES for multimodalhateful meme classification. ISSUES leverages a pre-trained CLIPvision-language model and the textual inversion technique to effectivelycapture the multimodal semantic content of the memes. The experiments show thatour method achieves state-of-the-art results on the Hateful Memes Challenge andHarMeme datasets. The code and the pre-trained models are publicly available athttps://github.com/miccunifi/ISSUES.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp