SPIQA Multimodal Scientific Paper Question Answering Dataset
Date
Size
Publish URL
Categories
This dataset was launched by a research team from Google Research and Johns Hopkins University in 2024. The relevant paper results are “SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers".
Background
Finding answers to questions in long scientific research articles is an important research area, which can help readers quickly resolve their queries. However, existing question answering (QA) datasets based on scientific papers are limited in size and focus only on text content. To address this limitation, the research team introduced SPIQA (Scientific Paper Image Question Answering).
Dataset Overview
This is the first large-scale QA dataset specifically designed to interpret complex graphics and tables in scientific research articles in various fields of computer science. It leverages the expertise and ability to understand graphics of multimodal large language models (MLLMs). The research team designed an information search task involving multiple images, covering a variety of charts, tables, diagrams, and result visualizations, using automatic and manual curation to create the dataset. SPIQA contains 270K questions, divided into training, validation, and three different evaluation parts. Through extensive experiments on 12 well-known base models, the team evaluated the ability of current multimodal systems to understand subtle aspects of research articles.