Command Palette
Search for a command to run...
Document Haystack Multimodal Document Benchmark Dataset
Date
Size
Paper URL
Document Haystack is a multimodal document benchmark dataset released by Amazon AGI in 2025. The related paper results are "Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark", which aims to evaluate the information retrieval and understanding capabilities of visual language models (VLMs) in long-context complex documents.
The dataset contains 400 document variants and 8,250 retrieval questions, covering real documents ranging from 5 to 200 pages. The data formats include original PDFs, 200 DPI paged images, and plain text parsed files, and is suitable for tasks such as Question-Answering and Visual Question-Answering.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.