Date

5 hours ago

Organization

Paper URL

2605.00416

Tags

Artificial Intelligence

Embodied Intelligence

Reinforcement Learning

Machine Learning

Deep Learning

Learning While Deploying (LWD) was proposed in 2026 by researchers from the Shanghai Institute for Innovation, AIZ Robotics, and Columbia University. The related research findings were published in a paper. Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies .

LWD is a scalable framework combining large-scale fleet deployment with offline-to-online reinforcement learning. It primarily addresses the challenges of "distribution shift and long-tail failure" in general vision-language-action (VLA) models performing tasks in complex real-world environments, due to reliance solely on offline pre-trained data. This framework introduces Distributed Implicit Value Learning (DIVL) and Q-learning based on adjoint matching (QAM) to continuously aggregate data from autonomous robot interactions and human intervention in real-world deployments, enabling stable iteration of the policy model without deviating from the actual application scenario. Research results show that LWD effectively overcomes the learning bottleneck caused by sparse rewards, significantly enhancing the adaptability and generalization ability of general-purpose models in various real-world physical environments. In eight complex real-world embodied scenarios, including supermarket stocking, tea brewing, and cocktail mixing, a single general-purpose policy model achieved an average task success rate of up to 951 TP3T and significantly reduced the execution time of long-term tasks.

Related Wiki

Federated Learning

A decentralized machine learning approach that keeps training data on a local device and trains a shared global model by aggregating locally computed model updates only.

21 days ago

Guided Thought Reinforcement

GTR can guide model reasoning in complex visual environments and prevent "brain breakdown".

21 days ago

Theory of Space

Spatial theory refers to the framework of an intelligent agent’s ability to construct, update and utilize spatial beliefs in an environment of incomplete information through active exploration.

21 days ago

Dense Retriever

The dense search engine is responsible for quickly finding the paragraphs most relevant to the query semantics from a massive document library, and is the core foundational component of the search enhancement generation system.

21 days ago

Peak-Return Greedy Slicing

PRGS significantly enhances the ability of offline reinforcement learning models to stitch together high-reward experiences.

21 days ago

Safety Comparison Method: Deep Aligned Visual Safety Prompt

It effectively solves the key challenges in LVLM secure alignment.

a month ago

iSeal Fingerprint Recognition Method

iSeal achieves a 100% fingerprint success rate (FSR) against more than 10 attacks on 12 LLMs.

a month ago

Sparse Code Tree Decoding Tree Sketching

By leveraging GPU parallelism to efficiently expand the decoding tree, fast and scalable optimization of the inference path is achieved.

a month ago

Model Souping

Model Souping can generate a better model by averaging the weights of multiple fine-tunings.

a month ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Date

5 hours ago

Organization

Paper URL

2605.00416

Related Wiki

Federated Learning

A decentralized machine learning approach that keeps training data on a local device and trains a shared global model by aggregating locally computed model updates only.

21 days ago

Guided Thought Reinforcement

GTR can guide model reasoning in complex visual environments and prevent "brain breakdown".

21 days ago

Theory of Space

Spatial theory refers to the framework of an intelligent agent’s ability to construct, update and utilize spatial beliefs in an environment of incomplete information through active exploration.

21 days ago

Dense Retriever

21 days ago

Peak-Return Greedy Slicing

PRGS significantly enhances the ability of offline reinforcement learning models to stitch together high-reward experiences.

21 days ago

Safety Comparison Method: Deep Aligned Visual Safety Prompt

It effectively solves the key challenges in LVLM secure alignment.

a month ago

iSeal Fingerprint Recognition Method

iSeal achieves a 100% fingerprint success rate (FSR) against more than 10 attacks on 12 LLMs.

a month ago

Sparse Code Tree Decoding Tree Sketching

By leveraging GPU parallelism to efficiently expand the decoding tree, fast and scalable optimization of the inference path is achieved.

a month ago

Model Souping

Model Souping can generate a better model by averaging the weights of multiple fine-tunings.

a month ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Learning While Deploying

Build AI with AI

HyperAI Newsletters

Command Palette

Learning While Deploying

Related Wiki

Federated Learning

Guided Thought Reinforcement

Theory of Space

Dense Retriever

Peak-Return Greedy Slicing

Safety Comparison Method: Deep Aligned Visual Safety Prompt

iSeal Fingerprint Recognition Method

Sparse Code Tree Decoding Tree Sketching

Model Souping

Build AI with AI

HyperAI Newsletters

Command Palette

Learning While Deploying

Related Wiki

Federated Learning

Guided Thought Reinforcement

Theory of Space

Dense Retriever

Peak-Return Greedy Slicing

Safety Comparison Method: Deep Aligned Visual Safety Prompt

iSeal Fingerprint Recognition Method

Sparse Code Tree Decoding Tree Sketching

Model Souping

Build AI with AI

HyperAI Newsletters

Related Wiki

Federated Learning

Guided Thought Reinforcement

Theory of Space

Dense Retriever

Peak-Return Greedy Slicing

Safety Comparison Method: Deep Aligned Visual Safety Prompt

iSeal Fingerprint Recognition Method

Sparse Code Tree Decoding Tree Sketching

Model Souping

Related Wiki

Federated Learning

Guided Thought Reinforcement

Theory of Space

Dense Retriever

Peak-Return Greedy Slicing

Safety Comparison Method: Deep Aligned Visual Safety Prompt

iSeal Fingerprint Recognition Method

Sparse Code Tree Decoding Tree Sketching

Model Souping