HyperAIHyperAI

Command Palette

Search for a command to run...

強化学習と対戦相手のポーズ推定を用いた自律的な追い越し軌道最適化

Matej Rene Cihlar Luka Šiktar Branimir Ćaran Marko Švaco

概要

自動運転車にとって、他車を追い越す動作は最も複雑な運転操作の一つです。最適な自動追い越しを実現するために、運転システムは複数のセンサーに依存し、安全な軌道最適化と追い越し効率の向上を図ります。本論文では、LiDAR および深度画像データに基づき、多エージェント自動レーシング環境における追い越し軌道の最適化を可能にする強化学習メカニズムを提案します。開発された強化学習エージェントは、事前に生成されたレーシングラインデータとセンサー入力を用いて、最適な追い越しのための舵角と線形速度を計算します。本システムでは、2D 検出アルゴリズムを備えた LiDAR と、YOLO ベースの物体検出を備えた深度カメラを併用し、追い越すべき車両とそのポーズを識別します。LiDAR と深度カメラの検出データは、UKF(無香カルマンフィルター)を用いて融合され、レーシングシナリオにおける対戦車両のポーズ推定精度の向上および追い越し軌道の最適化が実現されます。その結果、提案アルゴリズムはシミュレーションおよび実世界実験の双方において追い越し動作を成功裡に実行し、(x, y) 方向のポーズ推定の RMSE はそれぞれ 0.0816 m、0.0531 m となりました。

One-sentence Summary

Researchers from the University of Zagreb propose a multi-agent reinforcement learning framework using PPO to optimize autonomous overtaking in racing scenarios. By fusing LiDAR and YOLOv8 depth data via an Unscented Kalman Filter for robust opponent pose estimation, their system achieves successful real-world maneuvers on the F1TENTH platform, surpassing single-agent limitations.

Key Contributions

  • The paper introduces a reinforcement learning agent using PPO that computes steering and velocity for optimal overtaking by integrating pre-generated raceline data with real-time sensor inputs in multi-agent racing environments.
  • A sensor fusion system combining 2D LiDAR detection and YOLOv8-based depth camera data is implemented with an Unscented Kalman Filter to generate robust opponent pose estimates for trajectory planning.
  • Experimental validation on a real-world F1TENTH platform demonstrates successful overtaking maneuvers in both simulation and physical tests, achieving a pose estimation RMSE of (0.0816, 0.0531) m in (x, y) coordinates.

Introduction

Autonomous overtaking in multi-agent racing environments demands precise trajectory optimization to balance speed and safety under dynamic, adversarial conditions. Prior approaches often rely on single-agent reinforcement learning that follows static racing lines or depend on simulation without real-world validation, while existing sensor fusion methods have not been fully adapted for the high-stakes context of autonomous racing. The authors leverage a reinforcement learning agent trained with PPO that incorporates real-time opponent pose estimation derived from fusing 2D LiDAR and depth camera data via an Unscented Kalman Filter. This system enables the vehicle to compute optimal steering and velocity commands for overtaking maneuvers, which the team successfully validated on a real-world F1TENTH platform with low pose estimation error.

Top Figure

Method

The proposed system integrates two primary components: pose estimation via sensor fusion and a reinforcement learning algorithm designed for overtaking maneuvers. The physical implementation relies on the F1TENTH platform equipped with a Jetson Xavier NX computer for processing. Sensor inputs are provided by a Hokuyo 2D LiDAR and an Intel RealSense D435 depth camera. The final hardware configuration is illustrated in the figure below.

Opponent pose estimation is performed relative to the RL agent using the on-board sensors. The LiDAR data undergoes object detection through clustering and rectangle fitting, where the most likely opponent cluster is selected based on the previous timestep estimate. Simultaneously, the depth camera utilizes both RGB and depth data. For RGB processing, a YOLOv8 object detection model identifies the opponent car's bounding box. Distance estimation leverages the bounding box height using a reciprocal function.

distance=abheight+b+cdistance = \frac { a } { b \cdot \mathrm{height} + b } + cdistance=bheight+ba+c

The parameters in this equation are determined experimentally. This method remains valid as the opponent is consistently viewed from the side, meaning height variation correlates primarily with distance rather than orientation. The yaw angle is approximated using the Ackermann steering model. Depth data is extracted from the bounding box center pixel location, with an average offset added to account for the measurement point on the opponent car. These measurements are fused using an Unscented Kalman Filter (UKF). The UKF state vector is defined as [x,y,vx,vy][x, y, v_x, v_y][x,y,vx,vy] utilizing a constant velocity model. Since the sensors operate at different publishing rates and formats, the filter updates on every measurement with static transformations applied for the sensor frames.

For the decision-making process, the authors implement a multi-agent reinforcement learning (MARL) algorithm to learn overtaking maneuvers. The Proximal Policy Optimization (PPO) algorithm is selected to ensure training stability through a clipped loss function.

LCLIP(θ)=E^t[min(rt(θ)A^t,clip(rt(θ),1ϵ,1+ϵ)A^t)]L ^ { C L I P } ( \theta ) = \widehat { \mathbb { E } } _ { t } [ \operatorname* { m i n } ( r _ { t } ( \theta ) \widehat { A } _ { t } , \mathrm { c l i p } ( r _ { t } ( \theta ) , 1 - \epsilon , 1 + \epsilon ) \widehat { A } _ { t } ) ]LCLIP(θ)=Et[min(rt(θ)At,clip(rt(θ),1ϵ,1+ϵ)At)]

Here, rt(θ)r_{t}(\theta)rt(θ) represents the probability ratio between the new and old policy, A^t\hat{A}_{t}A^t is the advantage estimate at time step ttt, and ϵ\epsilonϵ denotes the clipping range hyperparameter. The neural network architecture consists of a fully connected feedforward network with two hidden layers containing 256 neurons each, activated by ReLU functions. The input layer size is 162, corresponding to the observation size, while the output layer size is 2 with a tanh activation function. In the multi-agent setup, the opponent's actions are generated by the same neural network policy, and data from both agents contribute to the training process.

The environment is simulated using the F1TENTH gym, which models the car as a simple bicycle model and handles collisions via a 2D occupancy grid map. The simulation replicates the real-world LiDAR sensor with 1080 points per scan over 270 degrees. The observation space includes agent LiDAR data, odometry for both cars, and future waypoints. To reduce complexity, every 10th LiDAR point is sampled, resulting in 108 points describing surrounding obstacles. Waypoints are derived from a precalculated racing line generated using the minimum curvature approach. The action space comprises continuous steering angle and linear velocity commands. The steering angle is scaled to ± 0.34 rad, and linear velocity ranges from 0 to 3 m/s. The reward signal is a weighted sum of seven components including velocity, progress, overtaking status, raceline adherence, collision avoidance, heading error, and smoothness.

Experiment

  • Opponent pose estimation using a UKF successfully tracks the opponent by fusing camera and LiDAR data, providing robust tracking that prevents the LiDAR from locking onto incorrect objects and enabling continuous updates at 10 Hz despite slower sensor rates.
  • The UKF demonstrates superior performance in the y-direction compared to individual sensors, though it is less accurate than LiDAR alone in the x-direction, with estimation error increasing as distance grows.
  • A real-world RL agent deployed on an F1TENTH car successfully executes overtaking maneuvers without collisions in conditions similar to its training environment, though trajectory oscillations are amplified by real-world noise and delays.
  • The agent fails to generalize to significantly different track geometries, such as smaller looped tracks, highlighting a dependency on training conditions for successful deployment.

AIでAIを構築

アイデアからローンチまで — 無料のAIコーディング支援、すぐに使える環境、最高のGPU価格でAI開発を加速。

AI コーディング補助
すぐに使える GPU
最適な料金体系

HyperAI Newsletters

最新情報を購読する
北京時間 毎週月曜日の午前9時 に、その週の最新情報をメールでお届けします
メール配信サービスは MailChimp によって提供されています