StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding

Charts are common in literature across various scientific fields, conveyingrich information easily accessible to readers. Current chart-related tasksfocus on either chart perception that extracts information from the visualcharts, or chart reasoning given the extracted data, e.g. in a tabular form. Inthis paper, we introduce StructChart, a novel framework that leveragesStructured Triplet Representations (STR) to achieve a unified andlabel-efficient approach to chart perception and reasoning tasks, which isgenerally applicable to different downstream tasks, beyond thequestion-answering task as specifically studied in peer works. Specifically,StructChart first reformulates the chart data from the tubular form (linearizedCSV) to STR, which can friendlily reduce the task gap between chart perceptionand reasoning. We then propose a Structuring Chart-oriented RepresentationMetric (SCRM) to quantitatively evaluate the chart perception task performance.To augment the training, we further explore the potential of Large LanguageModels (LLMs) to enhance the diversity in both chart visual style andstatistical information. Extensive experiments on various chart-related tasksdemonstrate the effectiveness and potential of a unified chartperception-reasoning paradigm to push the frontier of chart understanding.