HyperAIHyperAI

Command Palette

Search for a command to run...

정적 템플릿에서 동적 런타임 그래프로: LLM 에이전트 워크플로우 최적화에 대한 개요

Ling Yue Kushal Raj Bhandari Ching-Yun Ko Dhaval Patel Shuxin Lin Nianjun Zhou Jianxi Gao Pin-Yu Chen Shaowu Pan

초록

LLM(대규모 언어 모델) 기반 시스템은 LLM 호출, 정보 검색, 도구 활용, 코드 실행, 메모리 업데이트 및 검증을 교차적으로 수행하는 실행 가능한 워크플로우를 구성하여 다양한 작업을 해결하는 데 있어 그 활용도가 점차 확대되고 있습니다. 본 논문은 이러한 워크플로우를 에이전트 계산 그래프(Agent Computation Graphs, ACGs)로 간주하고, 이를 설계 및 최적화하기 위한 최신 방법론들을 종합적으로 검토합니다. 우리는 워크플로우 구조가 결정되는 시점을 기준으로 문헌을 체계화하였으며, 여기서 '구조'란 어떤 컴포넌트 또는 에이전트가 포함되는지, 이들 간의 의존 관계는 어떠한지, 그리고 정보 흐름이 어떻게 이루어지는지를 의미합니다. 이러한 관점은 배포 전에 재사용 가능한 워크플로우 골격을 고정하는 정적 방법과, 실행 전 또는 실행 도중에 특정 런을 위해 워크플로우를 선택, 생성 또는 수정하는 동적 방법을 명확히 구분합니다.또한, 기존 연구들을 (1) 구조가 결정되는 시점, (2) 워크플로우의 어느 부분이 최적화되는지, 그리고 (3) 최적화를 이끄는 평가 신호(예: 작업 지표, 검증자 신호, 선호도, 또는 추적 데이터에서 도출된 피드백)라는 세 가지 차원에 따라 재구성하였습니다. 더불어 재사용 가능한 워크플로우 템플릿, 런별 구체화된 그래프, 그리고 실행 추적 (execution traces) 을 구분함으로써, 재사용 가능한 설계 선택과 특정 런에서 실제로 배포된 구조 및 구현된 런타임 동작을 명확히 분리하였습니다.마지막으로, 하류 작업 지표에 더해 그래프 수준의 속성, 실행 비용, 견고성, 그리고 입력에 따른 구조적 변이성을 보완하는 구조 인지형 (structure-aware) 평가 관점을 제시합니다. 본 연구의 궁극적 목표는 LLM 에이전트 워크플로우 최적화 분야의 향후 연구에 대해 명확한 어휘 체계를 제공하고, 새로운 방법론을 위치시키는 통합된 프레임워크를 마련하며, 기존 문헌에 대한 비교 가능한 시각을 제시하고, 재현 가능한 평가 기준을 정립하는 데 있습니다.

One-sentence Summary

Researchers from Rensselaer Polytechnic Institute and IBM Research propose a unified framework for agentic computation graphs, distinguishing static and dynamic workflow structures to optimize LLM agent systems. This survey introduces a structure-aware evaluation perspective that enhances reproducibility and clarifies design choices for complex, tool-intelligent workflows.

Key Contributions

  • The paper introduces agentic computation graphs (ACGs) as a unifying abstraction for executable LLM workflows, distinguishing between static methods that fix scaffolds before deployment and dynamic methods that generate or revise structures during execution.
  • A three-dimensional taxonomy is presented to organize existing literature based on when structure is determined, which workflow components are optimized, and the specific evaluation signals that guide the optimization process.
  • A structure-aware evaluation perspective is outlined that complements downstream task metrics with graph-level properties, execution cost, robustness, and structural variation to establish a more reproducible standard for future research.

Introduction

Large language model (LLM) systems are evolving from simple chatbots into complex agentic computation graphs that coordinate tools, code execution, and verification to solve tasks. The overall workflow structure, which dictates component dependencies and information flow, often determines system effectiveness and cost more than individual model capabilities alone. However, prior research and surveys have largely treated workflow design as a fixed implementation detail or focused on adjacent topics like tool selection and agent collaboration, leaving the optimization of the workflow structure itself as a first-class object largely unaddressed. To fill this gap, the authors introduce a unified framework that treats workflows as agentic computation graphs and categorizes methods based on when the structure is determined, ranging from static offline template search to dynamic runtime generation and editing. They further synthesize the literature across optimization targets, feedback signals, and update mechanisms while proposing a new evaluation protocol that separates downstream task metrics from graph-level properties and execution costs.

Dataset

The provided text does not contain a dataset description. It is an appendix section (42. A.1) that catalogs supporting materials such as tables for node-level prompt optimizers, adjacent routing methods, and background frameworks. Consequently, there is no information available regarding dataset composition, sources, subset details, training splits, or data processing strategies to include in the blog post.

Method

The authors introduce the Agentic Computation Graph (ACG) as a unifying abstraction for executable LLM-centered workflows. In this framework, nodes perform atomic actions such as LLM calls, information retrieval, or tool use, while edges encode control, data, or communication dependencies. The overall optimization process follows a cycle where a task input is mapped to an ACG, which is then instantiated as a reusable template. This template is executed to produce a trace, which is subsequently analyzed to optimize, observe, and refine the workflow before deployment.

As shown in the figure below:

The framework distinguishes between three key objects: the ACG template, the realized graph, and the execution trace. The template is a reusable executable specification defined as Gˉ=(V,E,Φ,Σ,A)\bar{\mathcal{G}} = (\mathcal{V}, \mathcal{E}, \Phi, \Sigma, \mathcal{A})Gˉ=(V,E,Φ,Σ,A), where V\mathcal{V}V and E\mathcal{E}E represent nodes and edges, Φ\PhiΦ contains node parameters like prompts and tools, Σ\SigmaΣ is the scheduling policy, and A\mathcal{A}A defines admissible actions. The realized graph Grun\mathcal{G}^{\text{run}}Grun is the specific structure actually used for a particular run, which may differ from the template through selection or editing. The execution trace τ={(st,at,ot,ct)}t=1T\tau = \{(s_t, a_t, o_t, c_t)\}_{t=1}^Tτ={(st,at,ot,ct)}t=1T records the sequence of states, actions, observations, and costs produced during execution.

Workflow optimization methods are categorized based on when the structure is determined. Static methods optimize a reusable template before deployment, focusing on offline template search, node-level optimization, or joint optimization of structure and local configuration. Dynamic methods determine part of the workflow at inference time, allowing for runtime adaptation. This includes selection and pruning of a fixed super-graph, pre-execution workflow generation based on query difficulty, or in-execution editing where the structure is revised during execution in response to feedback. The optimization objective generally balances task quality R(τ;x)R(\tau; x)R(τ;x) against execution cost C(τ)C(\tau)C(τ), formulated as maximizing E[R(τ;x)λC(τ)]\mathbb{E}[R(\tau; x) - \lambda C(\tau)]E[R(τ;x)λC(τ)].

The framework also outlines orthogonal comparison axes such as optimization target (node, graph, joint), feedback mechanisms (metric, verifier, preference), and update mechanisms (search generator, controller RL). Evaluation involves structure-aware assessment, downstream task validation, and efficiency metrics. Finally, the authors identify open questions regarding design trade-offs, such as when static optimization suffices versus when dynamic adaptation is necessary, and the role of verifiers in ensuring workflow validity.

Experiment

  • A standardized classification card is used to compare methods across stable dimensions like structural settings, optimization levels, and update mechanisms, ensuring consistent evaluation rather than relying on paper-specific descriptions.
  • Experiments validate that specific algorithm choices depend heavily on the available signals and evidence; for instance, search works best with trusted evaluators and discrete action spaces, while reinforcement learning suits sequential generation but requires careful reward design.
  • Evaluation protocols are shown to require a separation between structure-aware assessment of workflow quality and downstream task validation to distinguish between plausible graph generation and actual task success.
  • Studies demonstrate that reporting graph-level properties and robustness under perturbations, such as tool failures or schema drift, is essential to differentiate genuine structural improvements from brute-force compute or uncontrolled cost growth.

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp