Command Palette
Search for a command to run...
OpenRath : État d'exécution centré sur la session pour les Systèmes Agent
OpenRath : État d'exécution centré sur la session pour les Systèmes Agent
Fukang Wen Zhijie Wang Ruilin Xu
Résumé
Les systèmes modernes d'agent souffrent souvent d'un état d'exécution fragmenté : les transcriptions, les effets des outils, les événements de mémoire, le placement de l'espace de travail, la provenance des branches et les preuves de rejouabilité sont enregistrés séparément et deviennent difficiles à inspecter ou à reproduire. OpenRath répond à ce problème par un modèle de programmation de type PyTorch pour les systèmes multi-agent et multi-session. Cette analogie porte sur le rôle d'une abstraction d'exécution centrale de première classe, et non sur le calcul tensoriel. Son abstraction fondamentale est Session, la valeur d'exécution transmise entre agents et workflow. Une Session est branchable, inspectable, rejouable, consciente du backend et composable. Elle enregistre les segments de conversation, le placement dans le sandbox, les métadonnées de lignée, l'utilisation de token, les travaux en attente et les preuves des outils, tout en définissant l'endroit où les interactions de mémoire entrent dans l'enregistrement d'exécution. Puisque cet état est porté par la même valeur utilisée lors de l'exécution du programme, le fork, le merge et la rejouabilité deviennent des opérations d'exécution explicites plutôt que des états reconstruits à partir de traces externes. OpenRath définit également Sandbox, Tool, Agent, Memory, Workflow et Selector, Selector transformant le flux de contrôle en décisions routées par l'exécution. Ce rapport présente le modèle de programmation, l'architecture, les jalons audités et le protocole de preuve. Ses affirmations se limitent aux propriétés d'exécution contrôlées, tandis que les comparaisons quantitatives larges, la qualité des fournisseurs en direct, la disponibilité des backends optionnels et la qualité de la mémoire sont laissées à une évaluation ultérieure. La thèse centrale est que Session fournit aux systèmes d'agent une valeur d'exécution de première classe pour une composition auditable.
One-sentence Summary
The authors propose OpenRath, a PyTorch-like programming model that centralizes fragmented agent runtime state within a first-class Session abstraction, embedding conversation chunks, tool evidence, and lineage metadata into composable runtime values to enable explicit forking, merging, and replay, while restricting its current claims to controlled runtime properties pending broader quantitative evaluation.
Key Contributions
- OpenRath introduces a programming model centered on Session, a first-class runtime value that consolidates conversation chunks, tool outputs, sandbox placement, and lineage metadata into a single composable object flowing through agent execution.
- The framework replaces external trace reconstruction by treating branching, merging, and replay as explicit runtime operations on Session. A dedicated Selector component routes control flow based on this runtime state, ensuring memory interactions and tool calls remain auditable without relying on controller-side conventions.
- The system records token usage, pending work, and memory operations as explicit runtime events on Session rather than hiding them within prompts or external logs. This architecture supports a structured evidence protocol and audited milestone system that enables deterministic replay and systematic debugging.
Introduction
As multi-agent systems transition from isolated demos to complex, long-running workflows, maintaining a clear view of runtime state across tools, memory, and execution branches has become essential for debugging, auditing, and systematic evaluation. Existing frameworks and runtime layers typically handle orchestration, tracing, or memory in isolation, often burying intermediate state in controller code or framework-specific objects. This fragmentation obscures execution paths, making it difficult to reconstruct decisions, audit provenance, or reliably replay runs. The authors address this gap by introducing OpenRath, which proposes the Session as a first-class runtime state object. This connective layer preserves conversation history, tool effects, and memory operations in a single branchable and inspectable structure, enabling seamless state transfer across agents and workflows without forcing existing infrastructure to adopt incompatible representations.
Dataset
• Dataset composition and sources: The authors utilize a curated collection of established agent benchmarks and simulated environments spanning interactive tasks, software engineering, terminal operations, and web or desktop workflows.
• Subset details and filtering: The evaluation suite includes AgentBench and τ-bench for interactive settings, SWE-bench alongside SWE-agent and the human-filtered SWE-bench Verified subset for software engineering, TerminalBench and TerminalWorld for terminal tasks, and a broad range of web and embodied environments such as WebArena, VisualWebArena, WorkArena, OSWorld, WebShop, Mind2Web, ALFWWorld, ScienceWorld, GAIA, and TheAgentCompany. Specific subsets apply targeted constraints, including database-state verification in τ-bench and human curation for the SWE-bench Verified split.
• Data usage and processing: Rather than functioning as a training corpus, this collection serves as an interactive evaluation framework. The authors leverage these environments to measure agent outcomes in realistic settings while prioritizing trajectory inspectability and replayability. This approach treats reproducible execution paths as a foundational requirement for reliable performance scoring.
• Additional processing details: The authors do not apply traditional training splits, mixture ratios, or cropping strategies. Instead, the pipeline focuses on runtime state tracking and trajectory validation to ensure transparent and verifiable agent behavior during evaluation.
Method
The authors introduce OpenRath as a programming model that addresses the fragmentation of runtime state in modern agent systems. Instead of treating conversation transcripts, tool logs, memory updates, and branch provenance as separate side channels, the framework promotes these effects into a single, first-class runtime value called Session. Refer to the framework diagram below for a visual comparison of traditional loop-scattered state versus the proposed session boundary.
This design ensures that a Session is branchable, inspectable, replayable, and backend-aware. It carries conversation chunks, sandbox placement metadata, lineage records, token usage, pending work, and tool evidence. By keeping this state attached to the value used in program execution, operations such as forking, merging, and replaying become explicit runtime steps rather than states reconstructed from external traces.
To structure agent programs, the authors adopt a PyTorch-inspired programming model. The analogy focuses on architectural interfaces for composable computation rather than tensor mathematics. In this paradigm, a central value flows through reusable modules that expose a uniform transformation contract. The framework defines a compact vocabulary of runtime objects, each with a narrow boundary but preserving the same input-output shape.
Under this mapping, Session functions as the flowing value, analogous to a tensor. Agent modules act as reusable transformations similar to neural network layers, following a forward mapping contract. Workflow serves as a compositional container, while Sandbox handles explicit placement, Memory manages persistent agent-bound state, and Tool defines executable functions. Control flow is routed dynamically by a Selector object, which reads the current Session to decide the next step, ensuring branching and looping remain inspectable runtime decisions rather than hard-coded logic.
The runtime architecture governs how a Session remains inspectable as it moves through agents, tools, sandboxes, and stored artifacts. Rather than introducing separate runtime objects for every phase, OpenRath employs a concise lifecycle. A session is initialized from user or agent context, placed on an execution backend when necessary, transformed by agents or workflows, branched for parallel work, merged after review, and persisted for later replay.
Branching operations duplicate state while preserving parent-child relations. Merging joins compatible sessions and records both parents, with compatibility checks extending to sandbox handles to ensure placement consistency. This approach treats placement as an integral part of the runtime graph.
As a crossing object in the agent runtime stack, Session connects specialized layers without absorbing their responsibilities. The framework integrates with multi-agent APIs, graph runtimes, tracing SDKs, tool protocols, and real-environment evaluation harnesses by making their effects visible within a single Session object.
This ecosystem role allows OpenRath to coexist with existing infrastructure. Graph state records execution checkpoints for schedulers, trace spans record observed events for observers, and Session serves as the live value written for the agent program itself. Evidence is attached directly to this value, enabling fork, merge, handoff, and replay operations to function on ordinary program values.
Tool execution follows a layered path that separates runtime state from the execution backend. The model interacts with FlowToolCall schemas, while the session loop combines built-in and user tools to resolve calls by name. Arguments are validated against the session, and payloads are dispatched through the active sandbox to the backend. Side effects, stdout, artifacts, and errors return as tool-result chunks rather than disappearing into controller flow. Memory interactions are similarly bounded, with recall and commit operations exposed as visible runtime events that update the session record.
In multi-agent and multi-session designs, the framework maintains a consistent boundary. Agents act as reusable layers, workflows serve as reusable compositions, and the moving runtime value remains Session. This prevents the introduction of hidden message buses or controller-only traces when scaling from single-agent scripts to nested agent teams. The engineering contract ensures that handoffs, routing, and composition rely entirely on reading and returning Session state, preserving deterministic lineage export, local sandbox packets, and workflow transcripts across complex, multi-agent workflows.
Experiment
The evaluation substantiates a Session-centered runtime object for a narrow set of claims using deterministic evidence, establishing clear scope boundaries for the technical report. These qualitative findings separate implemented runtime semantics from optional integrations, unverified features like broad benchmark superiority or local-memory verification, and unaddressed safety risks. The framework mandates rigorous supporting evidence before any boundary expands, ensuring that all reported claims remain tightly aligned with documented artifacts.