HyperAIHyperAI

Command Palette

Search for a command to run...

강화 학습 및 상대 차량 포즈 추정을 활용한 자율 추월 궤적 최적화

Matej Rene Cihlar Luka Šiktar Branimir Ćaran Marko Švaco

초록

자율주행 차량에게 차량 추월은 가장 복잡한 주행 기동 중 하나입니다. 최적의 자율 추월을 달성하기 위해 주행 시스템은 안전한 궤적 최적화와 추월 효율성을 가능하게 하는 여러 센서에 의존합니다. 본 논문은 LiDAR 및 깊이 이미지 데이터를 기반으로 한 다중 에이전트 자율 레이싱 환경에서 추월 궤적 최적화를 가능하게 하는 강화학습 메커니즘을 제시합니다. 개발된 강화학습 에이전트는 사전 생성된 레이싱 라인 데이터와 센서 입력을 활용하여 최적의 추월을 위한 조향각 및 선형 속도를 계산합니다. 본 시스템은 2D 검출 알고리즘을 적용한 LiDAR와 YOLO 기반 객체 검출을 수행하는 깊이 카메라를 사용하여 추월 대상 차량 및 해당 차량의 자세를 식별합니다. LiDAR와 깊이 카메라의 검출 데이터는 UKF(Unscented Kalman Filter)를 통해 융합되어 레이싱 시나리오에서의 추월을 위한 상대 차량 자세 추정 정밀도 향상 및 궤적 최적화를 가능하게 합니다. 실험 결과, 제안된 알고리즘은 시뮬레이션 및 실제 환경 실험 모두에서 추월 기동을 성공적으로 수행하였으며, (x, y) 방향의 자세 추정 RMSE는 각각 0.0816 m 및 0.0531 m로 나타났습니다.

One-sentence Summary

Researchers from the University of Zagreb propose a multi-agent reinforcement learning framework using PPO to optimize autonomous overtaking in racing scenarios. By fusing LiDAR and YOLOv8 depth data via an Unscented Kalman Filter for robust opponent pose estimation, their system achieves successful real-world maneuvers on the F1TENTH platform, surpassing single-agent limitations.

Key Contributions

  • The paper introduces a reinforcement learning agent using PPO that computes steering and velocity for optimal overtaking by integrating pre-generated raceline data with real-time sensor inputs in multi-agent racing environments.
  • A sensor fusion system combining 2D LiDAR detection and YOLOv8-based depth camera data is implemented with an Unscented Kalman Filter to generate robust opponent pose estimates for trajectory planning.
  • Experimental validation on a real-world F1TENTH platform demonstrates successful overtaking maneuvers in both simulation and physical tests, achieving a pose estimation RMSE of (0.0816, 0.0531) m in (x, y) coordinates.

Introduction

Autonomous overtaking in multi-agent racing environments demands precise trajectory optimization to balance speed and safety under dynamic, adversarial conditions. Prior approaches often rely on single-agent reinforcement learning that follows static racing lines or depend on simulation without real-world validation, while existing sensor fusion methods have not been fully adapted for the high-stakes context of autonomous racing. The authors leverage a reinforcement learning agent trained with PPO that incorporates real-time opponent pose estimation derived from fusing 2D LiDAR and depth camera data via an Unscented Kalman Filter. This system enables the vehicle to compute optimal steering and velocity commands for overtaking maneuvers, which the team successfully validated on a real-world F1TENTH platform with low pose estimation error.

Top Figure
Top Figure

Method

The proposed system integrates two primary components: pose estimation via sensor fusion and a reinforcement learning algorithm designed for overtaking maneuvers. The physical implementation relies on the F1TENTH platform equipped with a Jetson Xavier NX computer for processing. Sensor inputs are provided by a Hokuyo 2D LiDAR and an Intel RealSense D435 depth camera. The final hardware configuration is illustrated in the figure below.

Opponent pose estimation is performed relative to the RL agent using the on-board sensors. The LiDAR data undergoes object detection through clustering and rectangle fitting, where the most likely opponent cluster is selected based on the previous timestep estimate. Simultaneously, the depth camera utilizes both RGB and depth data. For RGB processing, a YOLOv8 object detection model identifies the opponent car's bounding box. Distance estimation leverages the bounding box height using a reciprocal function.

distance=abheight+b+cdistance = \frac { a } { b \cdot \mathrm{height} + b } + cdistance=bheight+ba+c

The parameters in this equation are determined experimentally. This method remains valid as the opponent is consistently viewed from the side, meaning height variation correlates primarily with distance rather than orientation. The yaw angle is approximated using the Ackermann steering model. Depth data is extracted from the bounding box center pixel location, with an average offset added to account for the measurement point on the opponent car. These measurements are fused using an Unscented Kalman Filter (UKF). The UKF state vector is defined as [x,y,vx,vy][x, y, v_x, v_y][x,y,vx,vy] utilizing a constant velocity model. Since the sensors operate at different publishing rates and formats, the filter updates on every measurement with static transformations applied for the sensor frames.

For the decision-making process, the authors implement a multi-agent reinforcement learning (MARL) algorithm to learn overtaking maneuvers. The Proximal Policy Optimization (PPO) algorithm is selected to ensure training stability through a clipped loss function.

LCLIP(θ)=E^t[min(rt(θ)A^t,clip(rt(θ),1ϵ,1+ϵ)A^t)]L ^ { C L I P } ( \theta ) = \widehat { \mathbb { E } } _ { t } [ \operatorname* { m i n } ( r _ { t } ( \theta ) \widehat { A } _ { t } , \mathrm { c l i p } ( r _ { t } ( \theta ) , 1 - \epsilon , 1 + \epsilon ) \widehat { A } _ { t } ) ]LCLIP(θ)=Et[min(rt(θ)At,clip(rt(θ),1ϵ,1+ϵ)At)]

Here, rt(θ)r_{t}(\theta)rt(θ) represents the probability ratio between the new and old policy, A^t\hat{A}_{t}A^t is the advantage estimate at time step ttt, and ϵ\epsilonϵ denotes the clipping range hyperparameter. The neural network architecture consists of a fully connected feedforward network with two hidden layers containing 256 neurons each, activated by ReLU functions. The input layer size is 162, corresponding to the observation size, while the output layer size is 2 with a tanh activation function. In the multi-agent setup, the opponent's actions are generated by the same neural network policy, and data from both agents contribute to the training process.

The environment is simulated using the F1TENTH gym, which models the car as a simple bicycle model and handles collisions via a 2D occupancy grid map. The simulation replicates the real-world LiDAR sensor with 1080 points per scan over 270 degrees. The observation space includes agent LiDAR data, odometry for both cars, and future waypoints. To reduce complexity, every 10th LiDAR point is sampled, resulting in 108 points describing surrounding obstacles. Waypoints are derived from a precalculated racing line generated using the minimum curvature approach. The action space comprises continuous steering angle and linear velocity commands. The steering angle is scaled to ± 0.34 rad, and linear velocity ranges from 0 to 3 m/s. The reward signal is a weighted sum of seven components including velocity, progress, overtaking status, raceline adherence, collision avoidance, heading error, and smoothness.

Experiment

  • Opponent pose estimation using a UKF successfully tracks the opponent by fusing camera and LiDAR data, providing robust tracking that prevents the LiDAR from locking onto incorrect objects and enabling continuous updates at 10 Hz despite slower sensor rates.
  • The UKF demonstrates superior performance in the y-direction compared to individual sensors, though it is less accurate than LiDAR alone in the x-direction, with estimation error increasing as distance grows.
  • A real-world RL agent deployed on an F1TENTH car successfully executes overtaking maneuvers without collisions in conditions similar to its training environment, though trajectory oscillations are amplified by real-world noise and delays.
  • The agent fails to generalize to significantly different track geometries, such as smaller looped tracks, highlighting a dependency on training conditions for successful deployment.

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp