Google's Gemini AI Completes Pokémon Blue with Software Engineer's Assistance
Google's Gemini has achieved a significant milestone by completing Pokémon Blue, a 29-year-old video game. Google CEO Sundar Pichai celebrated the accomplishment on X by posting, “What a finish! Gemini 2.5 Pro just completed Pokémon Blue!” The livestream, titled "Gemini Plays Pokémon," was actually created by a 30-year-old software engineer named Joel Z, who is not affiliated with Google. Nonetheless, Google executives, including Logan Kilpatrick, the product lead for Google AI Studio, have been enthusiastic supporters of the project. Kilpatrick posted last month, “Gemini is making great progress at completing Pokémon and has earned its 5th badge—far ahead of any other model, which has only managed 3 badges so far, albeit with a different agent harness.” Pichai humorously chimed in, “We are working on API, Artificial Pokémon Intelligence:).” Why did Joel Z choose Pokémon Blue for this challenge? The decision can be traced back to February, when Anthropic, another AI research company, highlighted the progress its Claude model was making in Pokémon Red. Anthropic noted that Claude’s advanced capabilities in extended thinking and agent training provided a significant advantage in handling unpredictable tasks, such as playing a classic game. Pokémon Red and Blue, two versions of a Game Boy title first released in 1996, are part of the enduring Pokémon franchise. The success of Anthropic’s Claude model in Pokémon Red inspired Joel Z to create his own Twitch channel, "Claude Plays Pokémon," which in turn motivated him to take on the challenge with Gemini. However, it's important to note that Claude has yet to beat Pokémon Red. Does this suggest that Gemini is superior to Claude in the game? Joel Z, the creator of "Gemini Plays Pokémon," cautions against making direct comparisons. He explains, “Please don’t consider this a benchmark for how well an LLM can play Pokémon. Each model uses different tools and receives different information, making it difficult to determine which one is better.” Both AI models rely on external assistance to play the game effectively. This assistance, known as agent harnesses, provides the models with game screenshots and additional context. The AI then makes decisions based on this information, sometimes delegating tasks to specialized agents, before executing their chosen actions through button presses that correspond with their instructions. Joel Z also admitted that there were some developer interventions to help Gemini complete the game, but he maintains that these do not constitute cheating. “My interventions improve Gemini’s overall decision-making and reasoning abilities,” he explained. “I don’t provide specific hints or direct instructions for particular challenges, such as Mt. Moon. The only assistance I’ve given is informing Gemini that it needs to interact with a Rocket Grunt twice to get the Lift Key, a bug that was addressed in later versions of the game, such as Pokémon Yellow.” Moreover, the Gemini Plays Pokémon project is ongoing, with the framework continuously evolving. This dynamic development process underscores the complexity and potential of AI in interactive tasks and highlights the collaborative efforts between developers and AI models to achieve impressive results. The achievement of completing Pokémon Blue with the help of external tools and guidance demonstrates the current limitations and strengths of AI in gaming. While Gemini has made significant strides, it still requires human support to navigate certain aspects of the game. This project offers valuable insights into how AI can be enhanced and integrated into more complex and engaging applications, paving the way for future advancements in artificial intelligence.
