HyperAIHyperAI

Command Palette

Search for a command to run...

Document Haystack Multimodal Document Benchmark Dataset

Date

3 months ago

Size

14.6 GB

Organization

Amazon

Paper URL

2507.15882

Document Haystack is a multimodal document benchmark dataset released by Amazon AGI in 2025. The related paper results are "Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark", which aims to evaluate the information retrieval and understanding capabilities of visual language models (VLMs) in long-context complex documents.

The dataset contains 400 document variants and 8,250 retrieval questions, covering real documents ranging from 5 to 200 pages. The data formats include original PDFs, 200 DPI paged images, and plain text parsed files, and is suitable for tasks such as Question-Answering and Visual Question-Answering.

document-haystack.torrent
Seeding 1Downloading 0Completed 34Total Downloads 83
  • document-haystack/
    • README.md
      1.38 KB
    • README.txt
      2.76 KB
      • data/
        • document-haystack.zip
          14.6 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp