Command Palette
Search for a command to run...
사람이 루프 내에 포함된 추론 대규모 언어 모델 에이전트를 이용한 자동화된 입체적 방사선 수술 계획 수립
사람이 루프 내에 포함된 추론 대규모 언어 모델 에이전트를 이용한 자동화된 입체적 방사선 수술 계획 수립
초록
정밀 방사선외과술(Stereotactic radiosurgery, SRS)은 주요 생체 구조 주변에서 정밀한 용량 형상 조절이 필수적이지만, 투명성에 대한 우려로 인해 '블랙박스' 형태의 인공지능(AI) 시스템은 임상 적용에 한계가 있다. 본 연구에서는 뇌전이성 종양 환자 41명을 대상으로 단일분량 18 Gy SRS 치료를 시행한 후향적 코호트에서 사고의 흐름(Chain-of-thought) 추론이 에이전트 기반 계획 수립의 효율성을 향상시키는지 여부를 검증하였다. 우리는 자동 SRS 치료 계획을 위한 LLM 기반의 계획 에이전트인 SAGE(Secure Agent for Generative Dose Expertise)를 개발하였다. 각 사례에 대해 두 가지 변형 모델이 계획을 생성하였으며, 하나는 추론 없이 작동하는 모델, 다른 하나는 추론 기능을 갖춘 모델이었다. 추론 모델은 주요 평가 지표(PTV 복사도, 최대 용량, 적합성 지수, 경계 지수)에서 인간 전문가의 계획과 유사한 방사선 계획 결과를 보였으며(모든 p > 0.21), 고막 용량 측면에서는 인간 기준보다 유의미하게 낮은 수준을 기록하였다(p = 0.022). 적합성 향상을 위한 지시를 받았을 때, 추론 모델은 사전 제약 조건 검증(457건) 및 트레이드오프 논의(609건)와 같은 체계적인 계획 행동을 보였으나, 기준 모델은 이러한 사전 검토 및 고민 과정이 전혀 나타나지 않았다(각각 0건 및 7건). 콘텐츠 분석 결과, 제약 조건 검증과 인과적 설명이 모두 추론 에이전트에 집중되어 있음을 확인하였다. 최적화 추적 기록은 검증 가능한 로그로 활용 가능하며, 투명한 자동 계획 수립을 위한 실질적인 길을 제시한다.
One-sentence Summary
Henry Ford Health and Michigan State University researchers developed SAGE, an LLM-based stereotactic radiosurgery planner using chain-of-thought reasoning to generate transparent, auditable dose plans for brain metastases. Unlike non-reasoning models, SAGE's reasoning variant demonstrated prospective constraint verification and trade-off deliberation, matching human dosimetry on key metrics while significantly reducing cochlear dose in 41 patient cases, offering a path toward clinically adoptable AI planning.
Key Contributions
- Stereotactic radiosurgery (SRS) planning for brain metastases faces clinical adoption barriers due to the opacity of conventional black-box AI systems, which lack transparency in complex scenarios requiring precise dose shaping near critical structures.
- The study introduces SAGE, an LLM-based planning agent that leverages chain-of-thought reasoning to generate auditable optimization traces, enabling systematic constraint verification and trade-off deliberation absent in non-reasoning models.
- In a retrospective cohort of 41 patients, the reasoning variant achieved comparable dosimetry to human planners on primary endpoints (PTV coverage, maximum dose, conformity, and gradient indices; all p>0.21) while significantly reducing cochlear dose (p=0.022) and demonstrating 457 constraint verifications and 609 trade-off deliberations versus near-zero instances in the non-reasoning model.
Introduction
Stereotactic radiosurgery (SRS) for brain metastases demands extreme precision due to single-session high-dose delivery near critical organs at risk, requiring steep dose gradients to spare healthy brain tissue. This complexity strains an already limited pool of specialized planners and restricts SRS access primarily to academic centers. Prior AI-driven planning approaches relied on site-specific neural networks trained on institutional data, functioning as opaque black boxes with limited transparency and poor scalability across treatment centers. These limitations hinder clinical adoption, as regulatory frameworks and radiation oncology professionals prioritize explainable decision-making. The authors address this by implementing a human-in-the-loop large language model agent using SAGE, specifically designed for iterative, reasoning-driven SRS optimization. They demonstrate that a reasoning-capable LLM—generating explicit intermediate steps for spatial reasoning and constraint validation—produces auditable decision logs while improving plan quality over non-reasoning models, directly tackling the transparency and geometric complexity barriers in SRS planning.
Dataset
- The authors use a retrospective dataset of 41 brain metastasis patients treated with single-target stereotactic radiosurgery (SRS) at their institution between 2022 and 2024, adhering to clinical guidelines (18 Gy single fraction).
- Dataset composition includes CT images, segmented anatomical structures, clinical treatment plans, and dosimetric data, all sourced from the Varian Eclipse Treatment Planning System (version 16.1).
- All plans underwent dose calculation via the AAA algorithm (version 15.6.06) with a 1.25 mm dose grid resolution; beam geometry was fixed to match original clinical configurations.
- Dose volume histograms (DVH) and photon optimization used Eclipse algorithms (versions 15.6.05), with retrospective clinical and SAGE-generated plans housed entirely within Eclipse.
- The data serves for direct comparison between clinical plans and SAGE-generated alternatives, with no training splits or mixture ratios applied—it validates plan quality within the clinical workflow.
- Processing involved strict adherence to institutional protocols, IRB approval, and consistent use of Eclipse tools without additional cropping or metadata construction beyond standard clinical outputs.
Method
The authors leverage a dual-variant architecture within the SAGE framework to automate radiation treatment planning, integrating both non-reasoning and reasoning large language models (LLMs) within an iterative optimization loop. Upon initialization, the agent ingests the clinical scenario—including patient anatomy, target volume specifications, spatial relationships between the planning target volume (PTV) and organs at risk (OARs), and the prescription dose (18 Gy in a single fraction)—alongside the current optimizer state, which encapsulates all relevant dosimetric parameters such as DVH metrics for PTV and OARs. The agent is then prompted to achieve target coverage while strictly adhering to OAR constraints.
Refer to the framework diagram: the system bifurcates into two parallel execution paths—one for the non-reasoning model (LLaMa3.1) and one for the reasoning model (QwQ-32B). Both variants operate through identical iterative cycles comprising LLM-driven parameter adjustment, dose calculation, plan evaluation, and objective updates. Each cycle produces a new set of optimization objectives based on the current state, which feeds back into the next iteration. Optimization terminates when all clinical goals are simultaneously satisfied, or after a maximum of ten iterations, at which point the best-performing plan is selected according to deterministic stopping logic.
Following optimization, the resulting treatment plan enters a human-in-the-loop review stage, where a board-certified medical physicist evaluates whether quantitative clinical criteria are met. Plans failing conformity benchmarks are returned to SAGE with a standardized natural language refinement prompt requesting improved dose conformity while preserving target coverage and OAR constraints. This prompt is uniformly applied across all cases and model variants, ensuring consistent evaluation of the agent’s responsiveness to human feedback. The two-stage architecture thus enables assessment of both autonomous planning capability and adaptive refinement under human guidance.
Experiment
- Tested SAGE (LLM-based planning agent) on 41 brain metastasis patients for 18 Gy SRS, comparing reasoning (Qwen QwQ-32B) and non-reasoning (Llama 3.1-70B) variants against human plans
- Reasoning variant achieved equivalent primary dosimetry to clinicians: PTV coverage 96.8% (vs 96.5% clinical, p=0.21), conformity index, gradient index, and max dose (all p>0.21)
- Significantly reduced right cochlear dose versus clinical plans (p=0.022 after BH correction), with all plans meeting safety thresholds
- Upon refinement prompts, reasoning model improved conformity index more consistently (p<0.001) than non-reasoning variant (p=0.007), approaching clinical benchmarks
- Demonstrated exclusive deliberative behaviors: constraint verification (457 instances) and trade-off deliberation (609 instances) absent in non-reasoning model (0 and 7 instances)
- Produced five-fold fewer format errors (median 0 vs 3 per patient) while maintaining auditable optimization traces