HyperAIHyperAI

Command Palette

Search for a command to run...

Back to Headlines

DeepMind's Dreamer 4 learns complex tasks in imagination using scalable world model, achieves diamond mining in Minecraft without real gameplay

6 days ago

DeepMind has introduced Dreamer 4, a new AI agent capable of learning to perform complex tasks entirely within a scalable world model, using only a small amount of pre-recorded video data. Unlike previous AI systems that rely on millions of trial-and-error interactions in real or simulated environments, Dreamer 4 learns by imagining and practicing in a simulated world built from observed gameplay. The breakthrough lies in the agent’s ability to master long-horizon, real-world-like tasks—such as mining diamonds in Minecraft—without ever playing the game directly. Instead, it trains on a dataset of recorded videos showing human players interacting with the game, using those sequences to build a detailed internal model of how the world works. From there, it uses reinforcement learning to explore countless imagined scenarios, refining its behavior without physical risk. This approach is especially significant for robotics, where real-world testing is slow, expensive, and potentially dangerous. By training agents in a virtual world that accurately simulates physics and object interactions, Dreamer 4 demonstrates a path toward safer, faster, and more efficient robot development. Dreamer 4 is built on a large transformer-based architecture with a novel training method called shortcut forcing, which improves prediction accuracy and speeds up simulation generation by over 25 times compared to standard video models. The model runs in real time on a single GPU, enabling interactive exploration and testing. Researchers found that Dreamer 4 accurately predicts complex mechanics such as breaking blocks, crafting tools, using doors and chests, and even operating boats—tasks that require understanding of cause and effect, timing, and spatial reasoning. Its world model significantly outperforms earlier versions, capturing not just visual details but the underlying dynamics of the environment. One of the most striking aspects of Dreamer 4 is its ability to learn from minimal action data. While earlier models required thousands of hours of labeled gameplay with precise actions, Dreamer 4 learns the effects of mouse and keyboard inputs from video alone. With just a few hundred hours of recorded footage, it generalizes well to new situations, suggesting it could one day learn from internet videos of human activities. The team envisions future improvements, including adding long-term memory to maintain consistency over extended simulations and integrating language understanding to enable collaboration with humans. Ultimately, training the model on diverse internet videos could give it common sense knowledge of the physical world, allowing robots to be trained in rich, imagined environments. This work marks a major step toward building intelligent agents that can plan, adapt, and act in complex, open-ended environments—bringing us closer to a future where AI can assist with real-world tasks, from household chores to industrial automation.

Related Links