HyperAIHyperAI

Command Palette

Search for a command to run...

Microsoft Unveils Agent Lightning for RL-Driven LLM Training

Microsoft has unveiled Agent Lightning, an open-source framework designed to enhance multi-agent systems through reinforcement learning (RL). The framework enables performance improvements for large language models (LLMs) by transforming real agent behaviors into clean RL transitions, all without requiring changes to existing agent architectures. Agent Lightning models agents as partially observable Markov decision processes (POMDPs), where observations correspond to current inputs, actions represent model calls, and rewards can be either final or intermediate. By capturing call logs, along with input, output, and reward data, the framework filters out noise and generates high-quality training transitions. A key innovation is its “training-agent decoupling” architecture. The Lightning Server handles training and inference serving, offering an OpenAI-compatible API for seamless integration with updated models. Meanwhile, the Lightning Client operates within existing agent runtimes, collecting execution traces and streaming them back to the server in real time. This design preserves tight integration with tools, browsers, and other dependencies while offloading GPU-intensive training to the server layer. The framework supports two data collection paths. The default path leverages OpenTelemetry for telemetry collection, enabling easy integration with standard observability pipelines. A lightweight embedded tracer is also available for teams that prefer not to deploy OpenTelemetry, ensuring broad accessibility. All collected data is centralized for training, streamlining the process. In evaluation, the research team tested Agent Lightning across three complex tasks: text-to-SQL, retrieval-augmented generation, and mathematical reasoning. For text-to-SQL, the Spider benchmark was used, involving over 10,000 questions across 200 databases. Retrieval-augmented generation was assessed using the MuSiQue benchmark, built on a Wikipedia-scale index of 21 million documents. Mathematical reasoning was evaluated on the Calc X dataset, which relies on tool calls for computation. Results showed consistent and stable reward improvements across all tasks, demonstrating the framework’s effectiveness in enhancing agent performance through RL. The full paper is available at https://arxiv.org/abs/2508.03680v1. Key takeaways: - Agent Lightning is an open-source framework that optimizes multi-agent systems using RL without requiring architectural changes. - It models agents as POMDPs and extracts clean, noise-free training data from real-world execution. - Experiments confirm significant performance gains in text-to-SQL, retrieval-augmented generation, and math reasoning tasks.

Related Links