HyperAIHyperAI

Command Palette

Search for a command to run...

Contrôle prédictif piloté par les données et à déclenchement d'événement, amélioré par l'apprentissage par renforcement profond, pour un bras robotique souple actionné par câbles 3D

Cheng Ouyang Moeen Ul Islam Kaixiang Zhang Zhaojian Li Xiaobo Tan Dong Chen

Résumé

Les robots souples sont difficiles à contrôler en raison de leurs dynamiques non linéaires et variables dans le temps. Le contrôle prédictif alimenté par les données (DeePC) offre une alternative sans modèle en exploitant directement les trajectoires entrée-sortie mesurées pour construire un contrôleur prédictif. Cependant, sa formulation à horizon glissant nécessite la résolution d'un problème d'optimisation sous contraintes à chaque instant d'échantillonnage, ce qui peut être exigeant en calcul pour un déploiement en temps réel sur des plateformes robotiques aux ressources limitées. Pour remédier à cette limitation, nous proposons un cadre DeePC événementiel adaptatif basé sur l'apprentissage par renforcement (RL-ET-DeePC) pour le contrôle des robots souples. Une politique RL sans modèle est entraînée pour déterminer quand invoquer l'optimiseur DeePC en fonction de la représentation actuelle de l'état du système, réduisant ainsi les appels d'optimisation inutiles tout en préservant les performances en boucle fermée. Les résultats de simulation montrent que RL-ET-DeePC réduit la fréquence d'optimisation jusqu'à 66 % par rapport au DeePC périodique, tout en maintenant une précision de suivi comparable. Des expériences sur matériel réel menées sur un bras robotique souple à câbles tridimensionnel démontrent un transfert zero-shot, atteignant une réduction de 34 % de la fréquence d'optimisation avec une précision de suivi comparable à celle du DeePC périodique et des performances plus stables qu'une ligne de base événementielle à seuil statique.

One-sentence Summary

the paper propose RL-ET-DeePC, an event-triggered data-enabled predictive control framework that employs a model-free reinforcement learning policy to dynamically activate the optimizer, reducing optimization frequency by 66% in simulation and 34% during zero-shot transfer to a 3D cable-driven soft robotic arm while maintaining tracking accuracy comparable to periodic methods.

Key Contributions

  • Introduces an adaptive reinforcement-learning-based event-triggered DeePC (RL-ET-DeePC) framework that addresses the computational demands of real-time soft robotic control by adaptively scheduling the predictive optimizer.
  • Develops a model-free reinforcement learning policy that evaluates the current system state to dynamically trigger the optimizer only when necessary, thereby eliminating redundant calculations while preserving closed-loop tracking performance.
  • Validates the framework through simulations and hardware experiments on a three-dimensional cable-driven soft robotic arm, demonstrating up to a 66% reduction in optimization frequency during simulation and a 34% reduction during zero-shot hardware deployment while maintaining tracking accuracy comparable to periodic methods and superior consistency over static threshold baselines.

Introduction

Controlling soft robots is inherently challenging due to their highly nonlinear and time-varying dynamics, which makes traditional model-based approaches impractical for real-world deployment. Data-enabled predictive control (DeePC) offers a promising model-free alternative by directly using measured input-output trajectories to construct predictive controllers, yet its receding-horizon design requires solving a constrained optimization problem at every sampling instant. This computational burden creates a significant bottleneck for resource-limited robotic platforms that demand low-latency control. To address this, the authors introduce an adaptive reinforcement learning-based event-triggered DeePC framework that dynamically schedules optimizer calls based on real-time system states. By training a model-free reinforcement learning policy to activate the controller only when necessary, they substantially cut computational overhead while preserving tracking accuracy, as validated through both simulations and zero-shot hardware experiments on a three-dimensional cable-driven soft arm.

Method

The authors propose a control framework tailored for cable-driven soft robotic arms, such as the fabricated structure shown in the figure below.

The core methodology is an RL-ET-DeePC (Reinforcement Learning Event-Triggered Data-Enabled Predictive Control) architecture. As illustrated in the framework diagram, the system is divided into two primary stages: pre-data preprocessing and an RL-based event-triggered control loop.

Pre-Data Preprocessing and SVD-DeePC The method utilizes a data-driven approach to model the system dynamics without explicit identification of state-space matrices. The authors collect offline input-output trajectories, denoted as udu^{\mathrm{d}}ud and ydy^{\mathrm{d}}yd, and arrange them into block Hankel matrices of depth LLL. According to Willems's Fundamental Lemma, the column span of this Hankel matrix characterizes all admissible trajectories of the underlying linear time-invariant system.

HL=[HL(ud)][HL(yd)]\mathcal { H } _ { L } = \frac { \left[ \mathcal { H } _ { L } ( u ^ { \mathrm { d } } ) \right] } { \left[ \mathcal { H } _ { L } ( y ^ { \mathrm { d } } ) \right] }HL=[HL(yd)][HL(ud)]

However, soft robots require extensive data collection, resulting in high-dimensional Hankel matrices that are computationally expensive to process in real-time. To mitigate this, the authors apply Singular Value Decomposition (SVD) to compress the data. The concatenated Hankel matrix is factorized, and the matrix is truncated to rank rrr, retaining only the dominant singular values. This preserves the principal dynamical modes while significantly reducing the dimensionality of the decision variables. The resulting reduced Hankel matrix HˉL\bar{\mathcal{H}}_LHˉL allows any valid trajectory to be approximated as:

[u[1,L]y[1,L]]=HˉLgˉ\left[ \begin{array} { l } { u _ { [ 1 , L ] } } \\ { y _ { [ 1 , L ] } } \end{array} \right] = \bar { \mathcal { H } } _ { L } \bar { g }[u[1,L]y[1,L]]=HˉLgˉ

where gˉ\bar{g}gˉ is a lower-dimensional decision vector.

RL-Based Event-Triggered Control Loop The control loop, depicted in the lower section of the framework diagram, integrates this compressed data representation with a reinforcement learning agent to manage computational costs. At each time step ttt, the system state is derived from the tracking error ete_tet and its variation Δet\Delta e_tΔet. This state is fed into the RL Agent, which outputs a binary action at{0,1}a_t \in \{0, 1\}at{0,1}.

If the action is 111, the system triggers the SVD-DeePC Optimizer. This module solves a regularized optimization problem to minimize tracking error and control effort subject to the data-driven constraints. The optimization problem includes a slack variable to handle measurement noise and regularization terms to prevent overfitting.

mingˉ,u,y,σyyyrQ2+uR2+λyσy22+λggˉ22subject toHˉLgˉ=[uiniuyiniy]+[00σy0]\begin{array} { r l } { \underset { \bar { g } , u , y , \sigma _ { y } } { \mathrm { m i n } } } & { \| y - y _ { r } \| _ { Q } ^ { 2 } + \| u \| _ { R } ^ { 2 } + \lambda _ { y } \| \sigma _ { y } \| _ { 2 } ^ { 2 } + \lambda _ { g } \| \bar { g } \| _ { 2 } ^ { 2 } } \\ { \mathrm { s u b j e c t ~ t o } } & { \bar { \mathcal { H } } _ { L } \bar { g } = \left[ \begin{array} { l } { u _ { \mathrm { ini } } } \\ { u } \\ { y _ { \mathrm { ini } } } \\ { y } \end{array} \right] + \left[ \begin{array} { l } { 0 } \\ { 0 } \\ { \sigma _ { y } } \\ { 0 } \end{array} \right] } \end{array}gˉ,u,y,σyminsubject toyyrQ2+uR2+λyσy22+λggˉ22HˉLgˉ=uiniuyiniy+00σy0

The optimizer produces an optimal control sequence, the first element of which is applied to the robot, and the remaining elements are stored in the Control Sequence Buffer.

If the action is 000, the optimization is skipped to save computation. Instead, the controller retrieves the next control input from the Control Sequence Buffer. This allows the system to utilize the prediction horizon from the previous optimization step, transforming conventional receding-horizon control into an event-driven mechanism.

Supervisory Override Mechanism To ensure robustness, a Supervisor module acts as a safeguard within the control architecture. It monitors the Control Sequence Buffer and the time elapsed since the last update. If the buffer is empty or the time since the last optimization exceeds the prediction horizon minus one (ttlastN1t - t_{\text{last}} \geq N - 1ttlastN1), the Supervisor forces a trigger (δt=1\delta_t = 1δt=1), overriding the RL agent's decision. This ensures that the controller always has a valid control sequence available and prevents excessive open-loop execution. The effective triggering signal δt\delta_tδt determines whether the SVD-DeePC Optimizer runs or the buffered input is used.

Training and Interaction The RL agent is trained to balance tracking accuracy and computational efficiency. The reward function penalizes both the tracking error and the computational cost of triggering the optimizer. The parameter ρ\rhoρ in the reward function controls the trade-off, encouraging the agent to learn a state-dependent policy that triggers optimization only when necessary. The agent interacts with the environment in episodes, updating its policy based on the observed states, actions, and rewards. This framework is algorithm-agnostic, allowing for the integration of various RL approaches such as DQN, PPO, or A2C.

Experiment

The evaluation combines high-fidelity simulations and physical experiments on a cable-driven soft robotic arm to assess the proposed RL-ET-DeePC framework. Simulation tests validate the controller's ability to balance tracking accuracy with computational efficiency while generalizing to unseen trajectories. Hardware experiments further confirm that the learned policy successfully transfers to real-world conditions, maintaining stable control and context-aware triggering without relying on fixed error thresholds. Overall, the framework effectively reduces optimization frequency by prioritizing updates during dynamic transitions, preserving tracking fidelity comparable to periodic control while adapting intelligently to motion context.

The authors evaluate the RL-ET-DeePC framework on a physical soft robotic arm, comparing it against a periodic DeePC baseline. The results demonstrate a clear trade-off controlled by the penalty parameter: increasing the penalty significantly reduces the frequency of controller updates while maintaining acceptable tracking accuracy. The periodic DeePC controller serves as the high-accuracy baseline, whereas the RL-based approach effectively lowers computational burden by adapting when to re-optimize. Increasing the penalty parameter leads to a substantial decrease in the trigger rate, reducing optimization calls compared to the periodic baseline. The RL-ET-DeePC method maintains tracking accuracy comparable to the periodic DeePC controller, even with reduced update frequencies. The learned policy successfully transfers from simulation to hardware, preserving the desired accuracy-efficiency trade-off under real-world conditions.

The authors compare reinforcement learning-based event-triggered DeePC methods including DQN, A2C, and PPO with a periodic DeePC baseline. Increasing the penalty parameter rho consistently reduces the triggering frequency and total computation time for all RL algorithms, while slightly increasing the tracking error. PPO demonstrates the most efficient performance, achieving the greatest reduction in decision time while maintaining acceptable tracking accuracy relative to the other RL methods. Increasing the penalty parameter rho results in lower triggering rates and reduced computational time for all RL algorithms. PPO achieves the lowest total decision time among the RL methods, outperforming DQN and A2C in efficiency. The periodic DeePC baseline maintains the highest triggering rate and accuracy, serving as the performance ceiling that RL methods approximate with reduced computational cost.

The experiments evaluate a reinforcement learning-based event-triggered DeePC framework on a physical soft robotic arm, benchmarking it against a periodic DeePC baseline to assess computational efficiency and tracking performance. By adjusting a penalty parameter, the study demonstrates that the RL approach successfully balances reduced optimization frequency with maintained tracking accuracy, effectively lowering computational demands without compromising control quality. Among the tested algorithms, PPO proves most efficient, while the entire framework validates a robust simulation-to-hardware transfer that preserves the desired accuracy-efficiency trade-off under real-world conditions.


Créer de l'IA avec l'IA

De l'idée au lancement — accélérez votre développement IA avec le co-codage IA gratuit, un environnement prêt à l'emploi et le meilleur prix pour les GPU.

Codage assisté par IA
GPU prêts à l’emploi
Tarifs les plus avantageux

HyperAI Newsletters

Abonnez-vous à nos dernières mises à jour
Nous vous enverrons les dernières mises à jour de la semaine dans votre boîte de réception à neuf heures chaque lundi matin
Propulsé par MailChimp