8 months ago

Visual Question Answering

Computer Vision

Kushal Kafle Robik Shrestha Brian Price Scott Cohen Christopher Kanan

Abstract

Chart question answering (CQA) is a newly proposed visual question answering(VQA) task where an algorithm must answer questions about data visualizations,e.g. bar charts, pie charts, and line graphs. CQA requires capabilities thatnatural-image VQA algorithms lack: fine-grained measurements, optical characterrecognition, and handling out-of-vocabulary words in both questions andanswers. Without modifications, state-of-the-art VQA algorithms perform poorlyon this task. Here, we propose a novel CQA algorithm called parallel recurrentfusion of image and language (PReFIL). PReFIL first learns bimodal embeddingsby fusing question and image features and then intelligently aggregates theselearned embeddings to answer the given question. Despite its simplicity, PReFILgreatly surpasses state-of-the art systems and human baselines on both theFigureQA and DVQA datasets. Additionally, we demonstrate that PReFIL can beused to reconstruct tables by asking a series of questions about a chart.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Visual Question Answering

Computer Vision

Kushal Kafle Robik Shrestha Brian Price Scott Cohen Christopher Kanan

Abstract

Chart question answering (CQA) is a newly proposed visual question answering(VQA) task where an algorithm must answer questions about data visualizations,e.g. bar charts, pie charts, and line graphs. CQA requires capabilities thatnatural-image VQA algorithms lack: fine-grained measurements, optical characterrecognition, and handling out-of-vocabulary words in both questions andanswers. Without modifications, state-of-the-art VQA algorithms perform poorlyon this task. Here, we propose a novel CQA algorithm called parallel recurrentfusion of image and language (PReFIL). PReFIL first learns bimodal embeddingsby fusing question and image features and then intelligently aggregates theselearned embeddings to answer the given question. Despite its simplicity, PReFILgreatly surpasses state-of-the art systems and human baselines on both theFigureQA and DVQA datasets. Additionally, we demonstrate that PReFIL can beused to reconstruct tables by asking a series of questions about a chart.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

Answering Questions about Data Visualizations using Efficient Bimodal Fusion | Papers | HyperAI