HyperAI

Together AI has unveiled DeepSWE, a cutting-edge, fully open-source software engineering agent trained exclusively through reinforcement learning (RL). Built on the foundation of the Qwen3-32B language model, DeepSWE has achieved a remarkable 59% accuracy on the SWEBench-Verified benchmark and a 42.2% Pass@1 score, leading the pack among open-weight models. This launch marks a significant transition for Together AI, moving from traditional pre-training pipelines to the creation of autonomous language agents that learn and evolve through real-world interactions. Reinforcement Learning Meets Code Generation DeepSWE's development involved post-training the Qwen3-32B model using rLLM, Agentica’s customizable reinforcement learning framework for language agents. Unlike conventional supervised fine-tuning methods, rLLM allows the model to adapt and improve based on real-world experiences, making it highly effective for complex software engineering tasks. DeepSWE has been specifically honed to address challenges such as bug fixing, function completion, and code editing through a feedback-driven process rather than relying on static datasets. The training pipeline leverages Agentica’s R2EGym dataset, a benchmark designed specifically for software engineering and RL. This dataset emphasizes action-oriented goals, ensuring that the model can perform tasks that require iterative reasoning and precision, characteristics essential for code synthesis. Performance Benchmarks and Capabilities On the SWEBench-Verified benchmark, DeepSWE has demonstrated a 59% accuracy rate with test-time scaling, far surpassing previous open-weight models. In Pass@1 evaluations, which gauge the likelihood that the agent solves a problem correctly on the first try, DeepSWE has achieved an impressive 42.2%. These results highlight the effectiveness of RL in enhancing the model's ability to handle complex, dynamic tasks that demand continuous improvement and adaptation. Open Source and Reproducibility at Its Core Transparency is a cornerstone of this release. Together AI and Agentica have not only open-sourced the DeepSWE model but also shared the entire training methodology, including the rLLM framework, the R2EGym dataset, and the training configuration scripts. This level of openness encourages reproducibility and facilitates the broader research and developer communities to extend or build upon DeepSWE without constraints. Developers can access DeepSWE and the rLLM framework through various channels provided by the companies, ensuring easy integration into existing workflows and projects. From Language Reasoners to Language Agents DeepSWE represents a pivotal shift in the philosophy and practice of AI development: from building models that reason about language to creating agents that learn through interaction. While traditional language models excel in reasoning, they often fall short in areas that require adaptive learning and real-world feedback. Reinforcement learning addresses this gap, allowing the model to perform optimally at launch and continue improving over time as it adapts to new challenges and domains. This approach also supports local deployment, enabling organizations to customize and retrain DeepSWE for specific use cases. Developers and researchers can build upon the model to create agents tailored for domains like web navigation, robotics, or autonomous research assistance, utilizing the modular nature of rLLM. Conclusion DeepSWE stands as a landmark in the advancement of generative AI for software engineering. By integrating reinforcement learning with large language models and making the entire training infrastructure openly available, Together AI is paving the way for continuous learning and improvement in AI agents. This shift from mere language understanding to action-oriented agency holds significant promise for programming, automation, and the development of intelligent systems. Credit for this breakthrough goes to the dedicated researchers behind the project. To stay updated on further developments, follow Together AI on Twitter and join their 100k+ ML SubReddit community. Subscribing to their newsletter will also provide insights into the evolving landscape of AI.

Together AI Launches DeepSWE: An Open-Source RL-Trained Coding Agent That Tops SWEBench Benchmark

Related Links

Command Palette

Together AI Launches DeepSWE: An Open-Source RL-Trained Coding Agent That Tops SWEBench Benchmark

Related Links