HyperAI

Google has announced the release of Gemini Robotics On-Device, marking the first time a robotics AI model can run entirely on local devices. This development highlights the model's adaptability and usability, making it a significant milestone in the field of AI robotics. According to Parada, Google's head of robotics, Gemini Robotics On-Device is the first fully open-source variable-length architecture (VLA) model designed for developers to fine-tune locally. Developers can customize the model based on specific requirements and scenarios, allowing it to learn and master new skills rapidly. This capability results in impressive "rapid task adaptation," where the model can swiftly recognize and manipulate objects with precision after only 50 to 100 demonstrations, typically conducted through physical control of the robot. To demonstrate its versatility, Google has tested the model on a variety of third-party robots, including the ALOHA two-armed robot developed by Google itself, along with the two-armed collaborative robot from Franka Emika and the humanoid robot Apollo created by Apptronik. In one demonstration, Apollo successfully followed verbal commands to place a black T-shirt into a laundry basket or insert a magic wand into a toy box, showing accurate object recognition and smooth execution of tasks. However, the emphasis on capabilities also raises concerns about safety. When a powerful AI model is given control over real-world physical objects, ensuring its actions are safe, controllable, and predictable becomes paramount. Parada acknowledges that generative AI models can exhibit a degree of randomness in their outputs, which might result in some harmless but quirky behaviors in conversational chatbots. For physical robots, however, such unpredictability could lead to serious safety issues. To address these concerns, the On-Device model is just the core VLA, not encompassing a complete safety framework. Google recommends developers replicate the internal team's multi-layer safety strategies. This includes integrating the local model with the standard Gemini Live API to perform real-time language and content audits, filtering out unsafe or inappropriate commands. Additionally, an onboard safety controller should be implemented to monitor and limit the robot's range of motion and force levels, acting as the final line of defense. Google also encourages developers to adopt its published safety standards and engage in "red-teaming"—thoroughly exposing and fixing potential security flaws—before deployment. Currently, Google has initiated a “trusted tester program,” inviting experienced developers and researchers to apply for access to the new Gemini Robotics On-Device model and associated software development kits (SDKs). These SDKs include the MuJoCo physics simulator, enabling developers to perform extensive testing and fine-tuning in a simulated environment before transitioning to real-world applications. The release of this model is just the beginning. The current version of Gemini Robotics is based on the Gemini 2.0 architecture, and Google's robotics team often develops versions of the Gemini model ahead of its main releases. Considering the advancements in the latest Gemini 2.5, future iterations of the robotics model are expected to offer enhanced performance and capabilities. Source: 1. https://deepmind.google/discover/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/ Operated and organized by: 何暮龙

Google's New AI Model Enables Local Robot Operation

Related Links