HyperAIHyperAI

Command Palette

Search for a command to run...

Multimodal ArXiv Scientific Understanding Dataset

Date

a year ago

Organization

The University of Hong Kong

Paper URL

arxiv.org

Join the Discord Community

Multimodal ArXiv was launched by the University of Hong Kong and Peking University in 2024. The relevant paper is "Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models", which has been accepted by ACL 2024.

This dataset consists of ArXivCap and ArXivQA to enhance the scientific understanding of LVLM.

ArXivCap is a graph caption dataset containing 6.4 million images and 3.9 million captions from 572K ArXiv papers covering various scientific fields.

Drawing on ArXivCap, the research team introduced ArXivQA, a question-answering dataset generated by GPT-4V based on scientific graphs through prompts. ArXivQA greatly enhances the mathematical reasoning capabilities of the open source LVLM, achieving an absolute accuracy improvement of 10.4% on the multimodal mathematical reasoning benchmark.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Multimodal ArXiv Scientific Understanding Dataset | Datasets | HyperAI