Command Palette
Search for a command to run...
Foundation Protocol: Eine Koordinierungsschicht für die agentic Gesellschaft
Foundation Protocol: Eine Koordinierungsschicht für die agentic Gesellschaft
Zusammenfassung
Autonome Agenten wandeln sich von Werkzeugen zu einer Schicht der sozialen Infrastruktur: Sie durchsuchen das Internet, tätigen Käufe, stellen Software bereit, verwalten Systeme und interagieren zunehmend miteinander. Während diese Systeme skalieren, verlagert sich der Engpass von der reinen Modellkapazität hin zur Koordination. Agenten müssen verlässliche Beziehungen aufbauen, die Arbeit in Multi-Agenten-Systemen organisieren, Werte austauschen, eine KI-Wirtschaft unterstützen und unter realer Aufsicht sicher sowie rechenschaftspflichtig bleiben. Dieses Papier stellt das Foundation Protocol (FP) vor, eine graphbasierte Koordinationsschicht für eine entstehende menschlich-künstliche Intelligenz-Gesellschaft. FP vereinheitlicht heterogene Entitäten, einschließlich Agenten, Tools, Ressourcen, Menschen, Institutionen und Organisationen, und unterstützt die native Organisation mit mehreren Parteien sowie ereignisbasierte Zusammenarbeit. Es bietet zudem ökonomische Grundbausteine für die Messung, Quittungen und Abrechnung und behandelt Richtlinien, Herkunftsnachweise und Audits als erstklassige Anliegen. FP ist so konzipiert, dass es bestehende Protokolle umhüllt und verbindet, anstatt sie zu ersetzen, was eine inkrementelle Einführung ermöglicht und gleichzeitig den Integrations- und Governance-Aufwand reduziert. Ziel ist es, die Komposabilität autonomer Agenturen aufrechtzuerhalten, während die Rechenschaftspflicht unverhandelbar bleibt, sodass die Koordination selbst zu gemeinsamer Infrastruktur für eine offene, pluralistische und regierbare menschlich-künstliche Intelligenz-Gesellschaft werden kann.
One-sentence Summary
The authors introduce Foundation Protocol (FP), a graph-first coordination layer that unifies agents, tools, resources, and institutions through native multi-party organization, event-based collaboration, and economic primitives for metering, receipts, and settlement, while bridging existing protocols to elevate policy, provenance, and audit to first-class concerns and ensure composable, accountable human-AI collaboration.
Key Contributions
- The paper introduces the Foundation Protocol (FP), a graph-first coordination layer that unifies agents, tools, humans, and institutions into a single operational substrate. This architecture standardizes entities, sessions, and event traces to enable cross-protocol interoperation and native multi-party collaboration across diverse agentic workflows.
- FP provides ledger-agnostic economic primitives and policy enforcement mechanisms that treat metering, settlement, and provenance tracking as first-class protocol features. These components enable auditable transactions and compliance enforcement without relying on specific payment rails or application-specific governance tools.
- The architecture decouples a stable core from profiles, extensions, and bridges to facilitate incremental adoption alongside existing standards such as MCP, A2A, and A2UI. This modular design reduces integration and governance overhead while preserving accountability for large-scale human-AI coordination.
Introduction
Autonomous agents are transitioning from isolated tools to persistent participants in a hybrid human-AI society where they delegate authority, exchange value, and operate across organizational boundaries. This evolution shifts the primary bottleneck from model capability to coordination, making robust interoperability and governance critical for enabling safe, scalable economic and social workflows. Existing protocols such as MCP, A2A, and UCP solve specific interaction problems but result in significant fragmentation when applied to complex, cross-domain tasks. This siloed landscape causes semantic drift, breaks provenance chains, and creates patchwork oversight, ultimately forcing systems into either costly custom integrations or fragile, hard-to-audit architectures. The authors propose the Foundation Protocol, a graph-native coordination layer that models agents, tools, resources, and humans as addressable entities within a unified structure. This protocol introduces first-class primitives for multi-party organization, event-based collaboration, and ledger-agnostic economic settlement while embedding policy enforcement and auditability directly into the communication substrate. By designing the protocol to wrap and bridge existing standards, the authors facilitate incremental adoption and ensure that autonomous agency remains composable without sacrificing accountability.
Dataset
- Dataset composition and sources: The authors compile structured lifecycle traces generated by policy decisions within an AI company scenario. These records originate from reviewer approvals, founder deployment authorizations, payment checkpoint budget decisions, and system-rejected messages.
- Key details for each subset: The dataset is organized into event logs that capture each decision point. Provenance records bind every event to its governing policies and supporting evidence, forming a continuous evidence spine that tracks action ownership, policy context, economic outcomes, and any overridden access control decisions.
- Usage in the model: The authors use these traces to build a tamper-evident audit trail. This data enables post-execution inspection by parties absent during execution, supports efficient dispute resolution with external providers like GPU vendors, and demonstrates how the Entity & Trust and Regulation & Oversight planes operate together.
- Processing and metadata construction: Records are secured through signatures on envelopes to ensure tamper evidence. The pipeline extracts policy enforcement points and dispute signals to maintain a unified audit log, allowing users to query the full decision history without reconstructing state from scattered logs.
Method
The Foundation Protocol (FP) adopts a graph-native architecture, conceptualizing agentic systems as a network of entities (nodes) connected by relationships (edges), with interactions represented as activities over the graph. This view underpins a plane-based design that isolates core protocol semantics while making extension points explicit. The architecture consists of four primary planes—Entity & Trust, Transport & Routing, Interaction & Organization, and Regulation & Oversight—alongside a configuration and profiles layer that binds the core to concrete implementations. Each plane addresses a distinct structural aspect of the graph: the Entity & Trust Plane establishes identity, capabilities, and trust signals for participants; the Transport & Routing Plane manages addressing, discovery, and message delivery across diverse communication channels; the Interaction & Organization Plane defines the primitives for collaboration, including sessions, events, and economic transactions; and the Regulation & Oversight Plane provides mechanisms for policy enforcement, audit, and dispute resolution. These planes are designed to remain stable while allowing variability in transport, identity, and domain-specific patterns through profiles and extensions. The framework ensures that coordination primitives are consistent across heterogeneous systems, enabling scalable and auditable interactions among autonomous agents, tools, services, and humans.
The core semantics of FP are defined by a minimal vocabulary of seven objects: Entity, Session, Activity, Envelope, Event, Receipt/Settlement, and Provenance. This vocabulary is intentionally generic, enabling the protocol to express a wide range of interactions—from tool calls and multi-agent collaboration to organizational workflows and commerce—while remaining stable as higher-level patterns evolve. The Entity object represents any addressable participant, such as a human, agent, tool, or organization, and is characterized by its identity, capabilities, trust signals, and privacy constraints. The Session object is an explicit container for multi-party collaboration, binding participants, roles, policies, and optional budgets, which makes group interactions legible and enforceable. The Activity object captures the events and streams that occur within a session, providing ordering, correlation, and backpressure, allowing systems to maintain observability as they scale. The Envelope object standardizes message delivery, preserving message integrity and confidentiality through signing and encryption. The Receipt and Settlement objects represent economic primitives, such as metered usage, receipts, and settlement references, enabling auditable value exchange without mandating a specific payment rail. The Provenance object captures the evidence and policy decisions that govern interactions, ensuring that critical decisions are traceable and verifiable by third parties. This consistent vocabulary ensures that interactions across the protocol are coherent and interoperable, regardless of the underlying implementation.
The Entity & Trust Plane forms the foundation of the protocol, providing a unified model for all addressable participants. Each entity is identified by a globally unique address and exposes four key pieces of information: identity (identifiers, keys, versioning), capabilities (capability statements), trust signals (attestations, reputation), and privacy controls (permissions, ownership). To minimize overhead, FP employs progressive disclosure, where capability statements begin as short summaries containing purpose, risk tags, schema hashes, or pricing hints, with full details fetched only upon selection or authorization. This approach reduces token usage and avoids the common pattern of copying large tool specifications into a model's working context prematurely. Entity identity is the unit of accountability, allowing organizations to be represented as entities with their own keys and policies, and membership to be modeled as a first-class edge with scoped delegation. The protocol does not prescribe a specific identity scheme, supporting DIDs, WebPKI, or enterprise systems, but makes the basic structure explicit so that other planes can rely on it. Trust is treated through hooks for attestations, stakes, and reputation providers, enabling deployments to begin with local trust and gradually interoperate across domains without reducing trust decisions to ad hoc application logic.
The Transport & Routing Plane is designed to be transport-agnostic, defining what message delivery must preserve—such as addressing, discovery, channel setup, and termination—without choosing the underlying transport. This design ensures resilience to changes in network stacks and deployment environments, from local IPC to web-native transports and long-running asynchronous channels. Routing is a critical component, as agentic interactions rarely remain point-to-point; a group session may span multiple transports simultaneously, such as local IPC to a tool, HTTP to a remote agent, and SSE to a user interface. FP treats transport as a binding beneath a consistent addressing and trace layer, allowing messages to move across different channels while preserving ordering, backpressure, termination semantics, and a coherent record of the interaction. This is essential for scaling from a handful of cooperating agents to large networks and organizations without losing observability. The protocol's configuration and profiles layer binds the core semantics to concrete transports, identity methods, and deployment environments through profiles, registries, pattern libraries, and bridges, enabling incremental adoption rather than a disruptive migration.
The Interaction & Organization Plane provides the primitives for multi-party collaboration. Schemas define the structure of messages and codecs, while events and streams provide ordering, correlation, replay, and backpressure, forming a trace layer that is observable by agents, operators, user interfaces, and auditors. Sessions and organizations capture groups, roles, and delegation as first-class objects, making group interaction legible and enforceable. A session is an explicit container that binds participants, roles, policy references, and optional budgets, such as spend limits or token ceilings. This makes it clear that a bidder in an auction, a reviewer in a regulated workflow, and a tool provider in a pipeline are all represented as roles within a session rather than as application-specific special cases. Economic primitives standardize metering, receipts, settlement references, and dispute signals, enabling value exchange to be audited without mandating a specific payment rail. Without backpressure, a slow consumer would have to ingest every event at the same speed as a fast one, which is either incomplete or too expensive to follow. FP keeps collaboration observable without forcing every participant to consume the stream in the same way.
The Regulation & Oversight Plane treats safety as a protocol concern rather than an application afterthought. It provides a common place for policy evaluation, enforcement decisions, audit and provenance records, monitoring signals, compliance hooks, and dispute escalation. This aligns with the emerging economic reality that as autonomous systems scale, verification and accountability become scarce resources, and systems that produce low-cost evidence will be easier to deploy, govern, and trust. Critical decisions can be checked at protocol boundaries, such as before invocation or settlement, and the resulting evidence can be validated by third parties without exposing sensitive payloads. The oversight plane is intentionally decoupled from any single organization, supporting deployments that run policy locally, delegate checks to a compliance service, or provide evidence to an external auditor or regulator. Policies and provenance records can be referenced, hashed, and verified independently of the payloads they govern, making audit portable. The same interaction trace can be inspected under different policies without replaying the interaction. Oversight also covers failure cases, with disputes, revocations, and safety reports being first-class events, allowing networks to propagate trust-relevant information through explicit channels instead of relying on informal warnings, private logs, or prompt-level conventions.
The configuration and profiles layer is designed to keep the protocol core small and stable by making variability explicit. Profiles bind the core semantics to concrete transports, identity methods, and deployment environments, while registries publish schema and event-type catalogs. Pattern libraries describe reusable multi-party interaction templates, such as auctions, workflows, and bargaining, and bridges adapt existing ecosystems to FP's envelope, trace, policy, and evidence model. This separation clarifies what belongs in FP and what belongs around it: the core defines the objects and interaction semantics that must work across domains, while profiles choose wire formats and transport bindings, extensions add event types or interaction patterns, and bridges map external protocols into FP activities. This boundary keeps implementations lightweight and helps the protocol avoid becoming a monolith, enabling incremental adoption around existing systems.
