8 months ago

Pan Zhang1, Xiaoyi Dong1, Bin Wang1, Yuhang Cao1, Chao Xu1, Linke Ouyang1, Zhiyuan Zhao1, Haodong Duan1, Songyang Zhang1, Shuangrui Ding1, Wenwei Zhang1, Hang Yan1, Xinyue Zhang1, Wei Li1, Jingwen Li1, Kai Chen1, Conghui He1, Xingcheng Zhang1, Yu Qiao1, Dahua Lin1, Jiaqi Wang1,2

Abstract

We propose InternLM-XComposer, a vision-language large model that enablesadvanced image-text comprehension and composition. The innovative nature of ourmodel is highlighted by three appealing properties: 1) Interleaved Text-ImageComposition: InternLM-XComposer can effortlessly generate coherent andcontextual articles that seamlessly integrate images, providing a more engagingand immersive reading experience. Simply provide a writing instruction, and oursystem will generate the corresponding manuscript. It can intelligentlyidentify the areas in the text where images would enhance the content andautomatically insert the most appropriate visual candidates. 2) Comprehensionwith Rich Multilingual Knowledge: The text-image comprehension is empowered bytraining on an extensive multi-modal multilingual database with carefullycrafted strategies, resulting in a deep understanding of visual content. 3)State-of-the-art Performance: Our model consistently achieves state-of-the-artresults across various mainstream benchmarks for vision-language foundationalmodels, including MME Benchmark, MMBench, MMBench-CN, Seed-Bench, CCBench(Chinese Cultural Benchmark), QBench and Tiny LVLM. Owing to the absence ofestablished metrics for quantitatively assessing text-image composition, wehave devised a robust evaluation procedure that comprises both human andGPT4-Vision (GPT4-V) to ensure reliability. Notably, our InternLM-XComposerachieves competitive text-image composition scores compared to publicsolutions, including GPT4-V and GPT3.5. Collectively, InternLM-XComposerseamlessly blends advanced text-image comprehension and composition,revolutionizing vision-language interaction and offering new insights andopportunities. The InternLM-XComposer model series are publicly available athttps://github.com/InternLM/InternLM-XComposer.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Multimodal

Visual Question Answering

Any-to-Any

Multimodality

Task/Problem

Pan Zhang1, Xiaoyi Dong1, Bin Wang1, Yuhang Cao1, Chao Xu1, Linke Ouyang1, Zhiyuan Zhao1, Haodong Duan1, Songyang Zhang1, Shuangrui Ding1, Wenwei Zhang1, Hang Yan1, Xinyue Zhang1, Wei Li1, Jingwen Li1, Kai Chen1, Conghui He1, Xingcheng Zhang1, Yu Qiao1, Dahua Lin1, Jiaqi Wang1,2

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Multimodal

Visual Question Answering

Any-to-Any

Multimodality

Task/Problem

Pan Zhang1, Xiaoyi Dong1, Bin Wang1, Yuhang Cao1, Chao Xu1, Linke Ouyang1, Zhiyuan Zhao1, Haodong Duan1, Songyang Zhang1, Shuangrui Ding1, Wenwei Zhang1, Hang Yan1, Xinyue Zhang1, Wei Li1, Jingwen Li1, Kai Chen1, Conghui He1, Xingcheng Zhang1, Yu Qiao1, Dahua Lin1, Jiaqi Wang1,2

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

Pan Zhang*1, Xiaoyi Dong*1, Bin Wang1, Yuhang Cao1, Chao Xu1, Linke Ouyang1, Zhiyuan Zhao1, Haodong Duan1, Songyang Zhang1, Shuangrui Ding1, Wenwei Zhang1, Hang Yan1, Xinyue Zhang1, Wei Li1, Jingwen Li1, Kai Chen1, Conghui He1, Xingcheng Zhang1, Yu Qiao1, Dahua Lin1, Jiaqi Wang1,2

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

Pan Zhang*1, Xiaoyi Dong*1, Bin Wang1, Yuhang Cao1, Chao Xu1, Linke Ouyang1, Zhiyuan Zhao1, Haodong Duan1, Songyang Zhang1, Shuangrui Ding1, Wenwei Zhang1, Hang Yan1, Xinyue Zhang1, Wei Li1, Jingwen Li1, Kai Chen1, Conghui He1, Xingcheng Zhang1, Yu Qiao1, Dahua Lin1, Jiaqi Wang1,2

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

Pan Zhang*1, Xiaoyi Dong*1, Bin Wang1, Yuhang Cao1, Chao Xu1, Linke Ouyang1, Zhiyuan Zhao1, Haodong Duan1, Songyang Zhang1, Shuangrui Ding1, Wenwei Zhang1, Hang Yan1, Xinyue Zhang1, Wei Li1, Jingwen Li1, Kai Chen1, Conghui He1, Xingcheng Zhang1, Yu Qiao1, Dahua Lin1, Jiaqi Wang1,2

Abstract

Build AI with AI

HyperAI Newsletters

Pan Zhang1, Xiaoyi Dong1, Bin Wang1, Yuhang Cao1, Chao Xu1, Linke Ouyang1, Zhiyuan Zhao1, Haodong Duan1, Songyang Zhang1, Shuangrui Ding1, Wenwei Zhang1, Hang Yan1, Xinyue Zhang1, Wei Li1, Jingwen Li1, Kai Chen1, Conghui He1, Xingcheng Zhang1, Yu Qiao1, Dahua Lin1, Jiaqi Wang1,2

Pan Zhang1, Xiaoyi Dong1, Bin Wang1, Yuhang Cao1, Chao Xu1, Linke Ouyang1, Zhiyuan Zhao1, Haodong Duan1, Songyang Zhang1, Shuangrui Ding1, Wenwei Zhang1, Hang Yan1, Xinyue Zhang1, Wei Li1, Jingwen Li1, Kai Chen1, Conghui He1, Xingcheng Zhang1, Yu Qiao1, Dahua Lin1, Jiaqi Wang1,2

Pan Zhang1, Xiaoyi Dong1, Bin Wang1, Yuhang Cao1, Chao Xu1, Linke Ouyang1, Zhiyuan Zhao1, Haodong Duan1, Songyang Zhang1, Shuangrui Ding1, Wenwei Zhang1, Hang Yan1, Xinyue Zhang1, Wei Li1, Jingwen Li1, Kai Chen1, Conghui He1, Xingcheng Zhang1, Yu Qiao1, Dahua Lin1, Jiaqi Wang1,2