HyperAIHyperAI
2 months ago

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Carbune, Victor ; Mansoor, Hassan ; Liu, Fangyu ; Aralikatte, Rahul ; Baechler, Gilles ; Chen, Jindong ; Sharma, Abhanshu
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Abstract

Vision-language models (VLMs) are achieving increasingly strong performanceon multimodal tasks. However, reasoning capabilities remain limitedparticularly for smaller VLMs, while those of large-language models (LLMs) haveseen numerous improvements. We propose a technique to transfer capabilitiesfrom LLMs to VLMs. On the recently introduced ChartQA, our method obtainsstate-of-the-art performance when applied on the PaLI3-5B VLM by\citet{chen2023pali3}, while also enabling much better performance on PlotQAand FigureQA. We first improve the chart representation by continuing the pre-trainingstage using an improved version of the chart-to-table translation task by\citet{liu2023deplot}. We then propose constructing a 20x larger dataset thanthe original training set. To improve general reasoning capabilities andimprove numerical operations, we synthesize reasoning traces using the tablerepresentation of charts. Lastly, our model is fine-tuned using the multitaskloss introduced by \citet{hsieh2023distilling}. Our variant ChartPaLI-5B outperforms even 10x larger models such as PaLIX-55Bwithout using an upstream OCR system, while keeping inference time constantcompared to the PaLI3-5B baseline. When rationales are further refined with asimple program-of-thought prompt \cite{chen2023program}, our model outperformsthe recently introduced Gemini Ultra and GPT-4V.

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs | Latest Papers | HyperAI