HyperAIHyperAI

Command Palette

Search for a command to run...

Gemini Robotics 1.5 Unveils Advanced AI Agents for Smarter, Safer Physical Tasks

Gemini Robotics 1.5 marks a major leap forward in bringing AI agents into the physical world, enabling robots to perceive, plan, think, use tools, and act to complete complex, multi-step tasks. This advancement builds on earlier progress with the Gemini Robotics family of models and introduces a new agentic framework designed to make robots more intelligent, adaptable, and capable. At the core of this innovation are two specialized models working in tandem. Gemini Robotics-ER 1.5 serves as the embodied reasoning model, acting like a high-level brain that orchestrates robot behavior. It excels at spatial reasoning, logical decision-making, natural language interaction, and estimating task progress and success. It can also call tools such as Google Search or third-party functions to gather real-time information, making it highly effective in dynamic environments. The second model, Gemini Robotics 1.5, is the vision-language-action model. It interprets the robot’s surroundings using vision and language, translates high-level instructions from the reasoning model into precise physical actions, and can explain its decision-making process in natural language—increasing transparency and trust. Together, these models allow robots to break down complex tasks into manageable steps, adapt to changing conditions, and generalize across different environments and robot types. Remarkably, Gemini Robotics 1.5 demonstrates strong cross-embodiment learning. Tasks trained on one robot, such as ALOHA 2, can be successfully executed on entirely different robots like Apptronik’s Apollo humanoid or the bi-arm Franka robot, without retraining—accelerating deployment and reducing development time. Gemini Robotics-ER 1.5 has achieved state-of-the-art performance across 15 academic benchmarks in embodied reasoning, including Point-Bench, Where2Place, BLINK, and VSI-Bench. Its ability to reason about safety, context, and task goals significantly improves both task success and environmental awareness. Safety is a central focus in this development. The models are designed with a holistic safety approach, incorporating high-level semantic reasoning, alignment with Google’s AI Principles, and integration with low-level safety systems like collision avoidance. Google has also upgraded the ASIMOV benchmark to better evaluate semantic safety, with enhanced data coverage, improved annotations, new question types, and video modalities. Gemini Robotics-ER 1.5 performs exceptionally well on this benchmark, demonstrating advanced understanding of both physical and ethical safety constraints. Available today via the Gemini API in Google AI Studio, Gemini Robotics-ER 1.5 is currently being shared with select partners. Developers are encouraged to explore the new capabilities through the Google Developer blog. This milestone represents a foundational step toward achieving artificial general intelligence in the physical world. By empowering robots to think, plan, and act autonomously, Google is paving the way for intelligent systems that can better assist humans in real-world settings—transforming industries from manufacturing and logistics to healthcare and home assistance.

Related Links