HyperAI

VEGA Scientific Paper Graphics and Text Data Understanding Dataset

Date

10 months ago

Size

45.22 GB

Organization

Xiamen University

Publish URL

github.com

特色图像

VEGA is a multimodal dataset focused on scientific paper understanding. It was proposed by Ji Rongrong's team at Xiamen University in 2024 and is designed to evaluate and improve the performance of models when processing inputs containing complex text and image information. The relevant paper is "VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large ModelsThe dataset contains text and image data from more than 50,000 scientific papers and is specifically designed for the Interleaved Image-Text Comprehension (IITC) task. The construction process of the VEGA dataset includes three steps: question screening, context construction, and answer modification. It aims to provide longer and more complex interleaved text and image content as input and requires the model to specify the reference image when answering.

VEGA is derived from the SciGraphQA dataset, which is a dataset for paper image understanding tasks and contains 295k question-answer pairs. The research team performed three steps on question screening, context construction, and answer modification to obtain the VEGA dataset. It contains 593,000 paper-type training data and 2,326 test data from 2 different tasks. It aims to provide longer and more complex text-image interlaced content as input and requires the model to specify the referenced image when answering.

  • Question screening: Some questions in the original data set lack clear picture references, which will cause confusion when the input information is expanded to multiple pictures.
  • Context construction: The original data set only has one question and answer for one image, and provides little context information. In order to expand the amount of text and images, the research team downloaded the source files of relevant papers on arxiv and constructed data with two lengths of 4k tokens and 8k tokens. Each question and answer pair contains at most 8 images.
  • Answer modification: The author modified the answers in the original dataset and indicated the images referenced when answering to meet the requirements of the IITC task.
VEGA.torrent
Seeding 1Downloading 1Completed 107Total Downloads 88
  • VEGA/
    • README.md
      2.43 KB
    • README.txt
      4.86 KB
      • data/
        • VEGA.zip
          45.22 GB