5달 전

Humza Nusrat Luke Francisco Bing Luo Hassan Bagher-Ebadian Joshua Kim Karen Chin-Snyder Salim Siddiqui Mira Shah Eric Mellon Mohammad Ghassemi

초록

정밀 방사선외과술(Stereotactic radiosurgery, SRS)은 주요 생체 구조 주변에서 정밀한 용량 형상 조절이 필수적이지만, 투명성에 대한 우려로 인해 '블랙박스' 형태의 인공지능(AI) 시스템은 임상 적용에 한계가 있다. 본 연구에서는 뇌전이성 종양 환자 41명을 대상으로 단일분량 18 Gy SRS 치료를 시행한 후향적 코호트에서 사고의 흐름(Chain-of-thought) 추론이 에이전트 기반 계획 수립의 효율성을 향상시키는지 여부를 검증하였다. 우리는 자동 SRS 치료 계획을 위한 LLM 기반의 계획 에이전트인 SAGE(Secure Agent for Generative Dose Expertise)를 개발하였다. 각 사례에 대해 두 가지 변형 모델이 계획을 생성하였으며, 하나는 추론 없이 작동하는 모델, 다른 하나는 추론 기능을 갖춘 모델이었다. 추론 모델은 주요 평가 지표(PTV 복사도, 최대 용량, 적합성 지수, 경계 지수)에서 인간 전문가의 계획과 유사한 방사선 계획 결과를 보였으며(모든 p > 0.21), 고막 용량 측면에서는 인간 기준보다 유의미하게 낮은 수준을 기록하였다(p = 0.022). 적합성 향상을 위한 지시를 받았을 때, 추론 모델은 사전 제약 조건 검증(457건) 및 트레이드오프 논의(609건)와 같은 체계적인 계획 행동을 보였으나, 기준 모델은 이러한 사전 검토 및 고민 과정이 전혀 나타나지 않았다(각각 0건 및 7건). 콘텐츠 분석 결과, 제약 조건 검증과 인과적 설명이 모두 추론 에이전트에 집중되어 있음을 확인하였다. 최적화 추적 기록은 검증 가능한 로그로 활용 가능하며, 투명한 자동 계획 수립을 위한 실질적인 길을 제시한다.

One-sentence Summary

Henry Ford Health and Michigan State University researchers developed SAGE, an LLM-based stereotactic radiosurgery planner using chain-of-thought reasoning to generate transparent, auditable dose plans for brain metastases. Unlike non-reasoning models, SAGE's reasoning variant demonstrated prospective constraint verification and trade-off deliberation, matching human dosimetry on key metrics while significantly reducing cochlear dose in 41 patient cases, offering a path toward clinically adoptable AI planning.

Key Contributions

Stereotactic radiosurgery (SRS) planning for brain metastases faces clinical adoption barriers due to the opacity of conventional black-box AI systems, which lack transparency in complex scenarios requiring precise dose shaping near critical structures.
The study introduces SAGE, an LLM-based planning agent that leverages chain-of-thought reasoning to generate auditable optimization traces, enabling systematic constraint verification and trade-off deliberation absent in non-reasoning models.
In a retrospective cohort of 41 patients, the reasoning variant achieved comparable dosimetry to human planners on primary endpoints (PTV coverage, maximum dose, conformity, and gradient indices; all $p > 0.21$ ) while significantly reducing cochlear dose ( $p = 0.022$ ) and demonstrating 457 constraint verifications and 609 trade-off deliberations versus near-zero instances in the non-reasoning model.

Introduction

Stereotactic radiosurgery (SRS) for brain metastases demands extreme precision due to single-session high-dose delivery near critical organs at risk, requiring steep dose gradients to spare healthy brain tissue. This complexity strains an already limited pool of specialized planners and restricts SRS access primarily to academic centers. Prior AI-driven planning approaches relied on site-specific neural networks trained on institutional data, functioning as opaque black boxes with limited transparency and poor scalability across treatment centers. These limitations hinder clinical adoption, as regulatory frameworks and radiation oncology professionals prioritize explainable decision-making. The authors address this by implementing a human-in-the-loop large language model agent using SAGE, specifically designed for iterative, reasoning-driven SRS optimization. They demonstrate that a reasoning-capable LLM—generating explicit intermediate steps for spatial reasoning and constraint validation—produces auditable decision logs while improving plan quality over non-reasoning models, directly tackling the transparency and geometric complexity barriers in SRS planning.

Dataset

The authors use a retrospective dataset of 41 brain metastasis patients treated with single-target stereotactic radiosurgery (SRS) at their institution between 2022 and 2024, adhering to clinical guidelines (18 Gy single fraction).
Dataset composition includes CT images, segmented anatomical structures, clinical treatment plans, and dosimetric data, all sourced from the Varian Eclipse Treatment Planning System (version 16.1).
All plans underwent dose calculation via the AAA algorithm (version 15.6.06) with a 1.25 mm dose grid resolution; beam geometry was fixed to match original clinical configurations.
Dose volume histograms (DVH) and photon optimization used Eclipse algorithms (versions 15.6.05), with retrospective clinical and SAGE-generated plans housed entirely within Eclipse.
The data serves for direct comparison between clinical plans and SAGE-generated alternatives, with no training splits or mixture ratios applied—it validates plan quality within the clinical workflow.
Processing involved strict adherence to institutional protocols, IRB approval, and consistent use of Eclipse tools without additional cropping or metadata construction beyond standard clinical outputs.

Method

The authors leverage a dual-variant architecture within the SAGE framework to automate radiation treatment planning, integrating both non-reasoning and reasoning large language models (LLMs) within an iterative optimization loop. Upon initialization, the agent ingests the clinical scenario—including patient anatomy, target volume specifications, spatial relationships between the planning target volume (PTV) and organs at risk (OARs), and the prescription dose (18 Gy in a single fraction)—alongside the current optimizer state, which encapsulates all relevant dosimetric parameters such as DVH metrics for PTV and OARs. The agent is then prompted to achieve target coverage while strictly adhering to OAR constraints.

Refer to the framework diagram: the system bifurcates into two parallel execution paths—one for the non-reasoning model (LLaMa3.1) and one for the reasoning model (QwQ-32B). Both variants operate through identical iterative cycles comprising LLM-driven parameter adjustment, dose calculation, plan evaluation, and objective updates. Each cycle produces a new set of optimization objectives based on the current state, which feeds back into the next iteration. Optimization terminates when all clinical goals are simultaneously satisfied, or after a maximum of ten iterations, at which point the best-performing plan is selected according to deterministic stopping logic.

Following optimization, the resulting treatment plan enters a human-in-the-loop review stage, where a board-certified medical physicist evaluates whether quantitative clinical criteria are met. Plans failing conformity benchmarks are returned to SAGE with a standardized natural language refinement prompt requesting improved dose conformity while preserving target coverage and OAR constraints. This prompt is uniformly applied across all cases and model variants, ensuring consistent evaluation of the agent’s responsiveness to human feedback. The two-stage architecture thus enables assessment of both autonomous planning capability and adaptive refinement under human guidance.

Experiment

Tested SAGE (LLM-based planning agent) on 41 brain metastasis patients for 18 Gy SRS, comparing reasoning (Qwen QwQ-32B) and non-reasoning (Llama 3.1-70B) variants against human plans
Reasoning variant achieved equivalent primary dosimetry to clinicians: PTV coverage 96.8% (vs 96.5% clinical, p=0.21), conformity index, gradient index, and max dose (all p>0.21)
Significantly reduced right cochlear dose versus clinical plans (p=0.022 after BH correction), with all plans meeting safety thresholds
Upon refinement prompts, reasoning model improved conformity index more consistently (p<0.001) than non-reasoning variant (p=0.007), approaching clinical benchmarks
Demonstrated exclusive deliberative behaviors: constraint verification (457 instances) and trade-off deliberation (609 instances) absent in non-reasoning model (0 and 7 instances)
Produced five-fold fewer format errors (median 0 vs 3 per patient) while maintaining auditable optimization traces

소스 PDF

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩

바로 사용 가능한 GPU

최적의 가격

시작하기 가격 보기

HyperAI Newsletters

최신 정보 구독하기

한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다

이메일 서비스 제공: MailChimp

5달 전

Humza Nusrat Luke Francisco Bing Luo Hassan Bagher-Ebadian Joshua Kim Karen Chin-Snyder Salim Siddiqui Mira Shah Eric Mellon Mohammad Ghassemi

초록

One-sentence Summary

Key Contributions

Stereotactic radiosurgery (SRS) planning for brain metastases faces clinical adoption barriers due to the opacity of conventional black-box AI systems, which lack transparency in complex scenarios requiring precise dose shaping near critical structures.
The study introduces SAGE, an LLM-based planning agent that leverages chain-of-thought reasoning to generate auditable optimization traces, enabling systematic constraint verification and trade-off deliberation absent in non-reasoning models.
In a retrospective cohort of 41 patients, the reasoning variant achieved comparable dosimetry to human planners on primary endpoints (PTV coverage, maximum dose, conformity, and gradient indices; all $p > 0.21$ ) while significantly reducing cochlear dose ( $p = 0.022$ ) and demonstrating 457 constraint verifications and 609 trade-off deliberations versus near-zero instances in the non-reasoning model.

Introduction

Dataset

The authors use a retrospective dataset of 41 brain metastasis patients treated with single-target stereotactic radiosurgery (SRS) at their institution between 2022 and 2024, adhering to clinical guidelines (18 Gy single fraction).
Dataset composition includes CT images, segmented anatomical structures, clinical treatment plans, and dosimetric data, all sourced from the Varian Eclipse Treatment Planning System (version 16.1).
All plans underwent dose calculation via the AAA algorithm (version 15.6.06) with a 1.25 mm dose grid resolution; beam geometry was fixed to match original clinical configurations.
Dose volume histograms (DVH) and photon optimization used Eclipse algorithms (versions 15.6.05), with retrospective clinical and SAGE-generated plans housed entirely within Eclipse.
The data serves for direct comparison between clinical plans and SAGE-generated alternatives, with no training splits or mixture ratios applied—it validates plan quality within the clinical workflow.
Processing involved strict adherence to institutional protocols, IRB approval, and consistent use of Eclipse tools without additional cropping or metadata construction beyond standard clinical outputs.

Method

Experiment

Tested SAGE (LLM-based planning agent) on 41 brain metastasis patients for 18 Gy SRS, comparing reasoning (Qwen QwQ-32B) and non-reasoning (Llama 3.1-70B) variants against human plans
Reasoning variant achieved equivalent primary dosimetry to clinicians: PTV coverage 96.8% (vs 96.5% clinical, p=0.21), conformity index, gradient index, and max dose (all p>0.21)
Significantly reduced right cochlear dose versus clinical plans (p=0.022 after BH correction), with all plans meeting safety thresholds
Upon refinement prompts, reasoning model improved conformity index more consistently (p<0.001) than non-reasoning variant (p=0.007), approaching clinical benchmarks
Demonstrated exclusive deliberative behaviors: constraint verification (457 instances) and trade-off deliberation (609 instances) absent in non-reasoning model (0 and 7 instances)
Produced five-fold fewer format errors (median 0 vs 3 per patient) while maintaining auditable optimization traces

소스 PDF

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩

바로 사용 가능한 GPU

최적의 가격

시작하기 가격 보기

HyperAI Newsletters

최신 정보 구독하기

한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다

이메일 서비스 제공: MailChimp

Command Palette

사람이 루프 내에 포함된 추론 대규모 언어 모델 에이전트를 이용한 자동화된 입체적 방사선 수술 계획 수립

Humza Nusrat Luke Francisco Bing Luo Hassan Bagher-Ebadian Joshua Kim Karen Chin-Snyder Salim Siddiqui Mira Shah Eric Mellon Mohammad Ghassemi3 more

초록

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

AI로 AI 구축

HyperAI Newsletters

Command Palette

사람이 루프 내에 포함된 추론 대규모 언어 모델 에이전트를 이용한 자동화된 입체적 방사선 수술 계획 수립

Humza Nusrat Luke Francisco Bing Luo Hassan Bagher-Ebadian Joshua Kim Karen Chin-Snyder Salim Siddiqui Mira Shah Eric Mellon Mohammad Ghassemi3 more

초록

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

AI로 AI 구축

HyperAI Newsletters

Command Palette

사람이 루프 내에 포함된 추론 대규모 언어 모델 에이전트를 이용한 자동화된 입체적 방사선 수술 계획 수립

Humza Nusrat Luke Francisco Bing Luo Hassan Bagher-Ebadian Joshua Kim Karen Chin-Snyder Salim Siddiqui Mira Shah Eric Mellon Mohammad Ghassemi3 more

초록

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

AI로 AI 구축

HyperAI Newsletters

Humza Nusrat Luke Francisco Bing Luo Hassan Bagher-Ebadian Joshua Kim Karen Chin-Snyder Salim Siddiqui Mira Shah Eric Mellon Mohammad Ghassemi

Humza Nusrat Luke Francisco Bing Luo Hassan Bagher-Ebadian Joshua Kim Karen Chin-Snyder Salim Siddiqui Mira Shah Eric Mellon Mohammad Ghassemi

Humza Nusrat Luke Francisco Bing Luo Hassan Bagher-Ebadian Joshua Kim Karen Chin-Snyder Salim Siddiqui Mira Shah Eric Mellon Mohammad Ghassemi