8 months ago

Yukang Feng Jianwen Sun Chuanhao Li Zizhen Li Jiaxin Ai Fanrui Zhang Yifan Chang Sizhuo Zhou Shenglin Zhang Yu Dai

Abstract

Recent advancements in Large Multimodal Models (LMMs) have significantlyimproved multimodal understanding and generation. However, these models stillstruggle to generate tightly interleaved image-text outputs, primarily due tothe limited scale, quality and instructional richness of current trainingdatasets. To address this, we introduce InterSyn, a large-scale multimodaldataset constructed using our Self-Evaluation with Iterative Refinement (SEIR)method. InterSyn features multi-turn, instruction-driven dialogues with tightlyinterleaved imagetext responses, providing rich object diversity and rigorousautomated quality refinement, making it well-suited for trainingnext-generation instruction-following LMMs. Furthermore, to address the lack ofreliable evaluation tools capable of assessing interleaved multimodal outputs,we introduce SynJudge, an automatic evaluation model designed to quantitativelyassess multimodal outputs along four dimensions: text content, image content,image quality, and image-text synergy. Experimental studies show that the SEIR method leads to substantially higherdataset quality compared to an otherwise identical process without refinement. Moreover, LMMs trained on InterSyn achieve uniform performance gains acrossall evaluation metrics, confirming InterSyn's utility for advancing multimodalsystems.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Yukang Feng Jianwen Sun Chuanhao Li Zizhen Li Jiaxin Ai Fanrui Zhang Yifan Chang Sizhuo Zhou Shenglin Zhang Yu Dai

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Yukang Feng Jianwen Sun Chuanhao Li Zizhen Li Jiaxin Ai Fanrui Zhang Yifan Chang Sizhuo Zhou Shenglin Zhang Yu Dai

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation

Yukang Feng Jianwen Sun Chuanhao Li Zizhen Li Jiaxin Ai Fanrui Zhang Yifan Chang Sizhuo Zhou Shenglin Zhang Yu Dai1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation

Yukang Feng Jianwen Sun Chuanhao Li Zizhen Li Jiaxin Ai Fanrui Zhang Yifan Chang Sizhuo Zhou Shenglin Zhang Yu Dai1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation

Yukang Feng Jianwen Sun Chuanhao Li Zizhen Li Jiaxin Ai Fanrui Zhang Yifan Chang Sizhuo Zhou Shenglin Zhang Yu Dai1 more

Abstract

Build AI with AI

HyperAI Newsletters

Yukang Feng Jianwen Sun Chuanhao Li Zizhen Li Jiaxin Ai Fanrui Zhang Yifan Chang Sizhuo Zhou Shenglin Zhang Yu Dai

Yukang Feng Jianwen Sun Chuanhao Li Zizhen Li Jiaxin Ai Fanrui Zhang Yifan Chang Sizhuo Zhou Shenglin Zhang Yu Dai

Yukang Feng Jianwen Sun Chuanhao Li Zizhen Li Jiaxin Ai Fanrui Zhang Yifan Chang Sizhuo Zhou Shenglin Zhang Yu Dai