HyperAIHyperAI

Command Palette

Search for a command to run...

Learning While Deploying

Learning While Deploying (LWD) was proposed in 2026 by researchers from the Shanghai Institute for Innovation, AIZ Robotics, and Columbia University. The related research findings were published in a paper. Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies .

LWD is a scalable framework combining large-scale fleet deployment with offline-to-online reinforcement learning. It primarily addresses the challenges of "distribution shift and long-tail failure" in general vision-language-action (VLA) models performing tasks in complex real-world environments, due to reliance solely on offline pre-trained data. This framework introduces Distributed Implicit Value Learning (DIVL) and Q-learning based on adjoint matching (QAM) to continuously aggregate data from autonomous robot interactions and human intervention in real-world deployments, enabling stable iteration of the policy model without deviating from the actual application scenario. Research results show that LWD effectively overcomes the learning bottleneck caused by sparse rewards, significantly enhancing the adaptability and generalization ability of general-purpose models in various real-world physical environments. In eight complex real-world embodied scenarios, including supermarket stocking, tea brewing, and cocktail mixing, a single general-purpose policy model achieved an average task success rate of up to 951 TP3T and significantly reduced the execution time of long-term tasks.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp