HyperAIHyperAI
2 months ago

UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning

Masry, Ahmed ; Kavehzadeh, Parsa ; Do, Xuan Long ; Hoque, Enamul ; Joty, Shafiq
UniChart: A Universal Vision-language Pretrained Model for Chart
  Comprehension and Reasoning
Abstract

Charts are very popular for analyzing data, visualizing key insights andanswering complex reasoning questions about data. To facilitate chart-baseddata analysis using natural language, several downstream tasks have beenintroduced recently such as chart question answering and chart summarization.However, most of the methods that solve these tasks use pretraining on languageor vision-language tasks that do not attempt to explicitly model the structureof the charts (e.g., how data is visually encoded and how chart elements arerelated to each other). To address this, we first build a large corpus ofcharts covering a wide variety of topics and visual styles. We then presentUniChart, a pretrained model for chart comprehension and reasoning. UniChartencodes the relevant text, data, and visual elements of charts and then uses achart-grounded text decoder to generate the expected output in naturallanguage. We propose several chart-specific pretraining tasks that include: (i)low-level tasks to extract the visual elements (e.g., bars, lines) and datafrom charts, and (ii) high-level tasks to acquire chart understanding andreasoning skills. We find that pretraining the model on a large corpus withchart-specific low- and high-level tasks followed by finetuning on threedown-streaming tasks results in state-of-the-art performance on threedownstream tasks.

UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning | Latest Papers | HyperAI