HyperAIHyperAI

Command Palette

Search for a command to run...

CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs

Daoan Zhang Junming Yang Hanjia Lyu Zijian Jin Yuan Yao Mingkai Chen Jiebo Luo

Abstract

When exploring the development of Artificial General Intelligence (AGI), acritical task for these models involves interpreting and processing informationfrom multiple image inputs. However, Large Multimodal Models (LMMs) encountertwo issues in such scenarios: (1) a lack of fine-grained perception, and (2) atendency to blend information across multiple images. We first extensivelyinvestigate the capability of LMMs to perceive fine-grained visual details whendealing with multiple input images. The research focuses on two aspects: first,image-to-image matching (to evaluate whether LMMs can effectively reason andpair relevant images), and second, multi-image-to-text matching (to assesswhether LMMs can accurately capture and summarize detailed image information).We conduct evaluations on a range of both open-source and closed-source largemodels, including GPT-4V, Gemini, OpenFlamingo, and MMICL. To enhance modelperformance, we further develop a Contrastive Chain-of-Thought (CoCoT)prompting approach based on multi-input multimodal models. This method requiresLMMs to compare the similarities and differences among multiple image inputs,and then guide the models to answer detailed questions about multi-image inputsbased on the identified similarities and differences. Our experimental resultsshowcase CoCoT's proficiency in enhancing the multi-image comprehensioncapabilities of large multimodal models.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp