8 months ago

Fangyu Liu Francesco Piccinno Syrine Krichene Chenxi Pang Kenton Lee Mandar Joshi Yasemin Altun Nigel Collier Julian Martin Eisenschlos

Abstract

Visual language data such as plots, charts, and infographics are ubiquitousin the human world. However, state-of-the-art vision-language models do notperform well on these data. We propose MatCha (Math reasoning and Chartderendering pretraining) to enhance visual language models' capabilities injointly modeling charts/plots and language data. Specifically, we proposeseveral pretraining tasks that cover plot deconstruction and numericalreasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recentlyproposed image-to-text visual language model. On standard benchmarks such asPlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by asmuch as nearly 20%. We also examine how well MatCha pretraining transfers todomains such as screenshots, textbook diagrams, and document figures andobserve overall improvement, verifying the usefulness of MatCha pretraining onbroader visual language tasks.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Visual Question Answering

Fangyu Liu Francesco Piccinno Syrine Krichene Chenxi Pang Kenton Lee Mandar Joshi Yasemin Altun Nigel Collier Julian Martin Eisenschlos

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Visual Question Answering

Fangyu Liu Francesco Piccinno Syrine Krichene Chenxi Pang Kenton Lee Mandar Joshi Yasemin Altun Nigel Collier Julian Martin Eisenschlos

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering | Papers | HyperAI

Command Palette

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Fangyu Liu Francesco Piccinno Syrine Krichene Chenxi Pang Kenton Lee Mandar Joshi Yasemin Altun Nigel Collier Julian Martin Eisenschlos

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Fangyu Liu Francesco Piccinno Syrine Krichene Chenxi Pang Kenton Lee Mandar Joshi Yasemin Altun Nigel Collier Julian Martin Eisenschlos

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Fangyu Liu Francesco Piccinno Syrine Krichene Chenxi Pang Kenton Lee Mandar Joshi Yasemin Altun Nigel Collier Julian Martin Eisenschlos

Abstract

Build AI with AI

HyperAI Newsletters