Envision Multi-Stage Event Visual Generation Dataset
Envision is a multi-image text pair dataset released by the Shanghai Artificial Intelligence Laboratory in 2025. The related research paper is titled "Envision: Benchmarking Unified Understanding & Generation for Causal World Process InsightsThe aim is to test the model's ability to understand causality and generate multi-stage events in real-world situations.
The dataset contains 1,000 event sequences and 4,000 four-stage text prompts, covering six major fields: natural sciences and humanities/history. The event materials are sourced from textbooks and online resources, selected by experts, and generated and polished by GPT-4o to form narrative prompts with clear causal chains and progressive stage structures.
Data composition:
- Subject coverage (6 categories in total)
- Natural Sciences (75%): Physics, Chemistry, Biology, Meteorology, Geography
- History and Culture (25%)
- Causal structure type
- Continuous causality: continuous changes within the same spatial scene, applicable to fine-grained physical and chemical processes.
- Discrete causality: jumps across time and space stages, applicable to geological evolution, life cycle, and historical events.

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.