4 months ago

Abstract

We introduce FigureQA, a visual reasoning corpus of over one millionquestion-answer pairs grounded in over 100,000 images. The images aresynthetic, scientific-style figures from five classes: line plots, dot-lineplots, vertical and horizontal bar graphs, and pie charts. We formulate ourreasoning task by generating questions from 15 templates; questions concernvarious relationships between plot elements and examine characteristics likethe maximum, the minimum, area-under-the-curve, smoothness, and intersection.To resolve, such questions often require reference to multiple plot elementsand synthesis of information distributed spatially throughout a figure. Tofacilitate the training of machine learning systems, the corpus also includesside data that can be used to formulate auxiliary objectives. In particular, weprovide the numerical data used to generate each figure as well as bounding-boxannotations for all plot elements. We study the proposed visual reasoning taskby training several models, including the recently proposed Relation Network asa strong baseline. Preliminary results indicate that the task poses asignificant machine learning challenge. We envision FigureQA as a first steptowards developing models that can intuitively recognize patterns from visualrepresentations of data.

Source PDF