HyperAIHyperAI

Command Palette

Search for a command to run...

Llama Nemotron VLM v1 Multimodal Image and Text Dataset

Date

2 months ago

Size

98.09 GB

Organization

NVIDIA

License

CC BY 4.0

Llama Nemotron VLM v1 is a high-quality image and text dataset released by NVIDIA in 2025 for VLM post-training. It is used to support the Llama-3.1-Nemotron-Nano-VL-8B-V1 document understanding model released by NVIDIA (supporting document question answering, graph question answering, AI2D and other scenarios).

The dataset consists of 21 subsets, totaling 2,863,854 samples. Covering three categories: visual question answering (VQA), captioning (image description), and optical character recognition (OCR), it includes re-annotated public image datasets, fully and semi-synthesized OCR data (in Chinese and English, at the character, word, and page levels), and internally annotated OCR sets. The dataset also refines and enhances the original QA (question answering) or captions, making it suitable for multimodal training and evaluation of applications such as intelligent agents, chat assistants, and RAGs.

The data includes:

  • VQA (Visual Question Answering): 1,917,755 examples
  • Captioning: 131,718 samples
  • OCR (text recognition): 814,381 samples
Llama-Nemotron-VLM-Dataset-v1.torrent
Seeding 2Downloading 0Completed 12Total Downloads 43
  • Llama-Nemotron-VLM-Dataset-v1/
    • README.md
      1.65 KB
    • README.txt
      3.3 KB
      • data/
        • Llama-Nemotron-VLM-Dataset-v1.zip
          98.09 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp