Command Palette
Search for a command to run...
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science
Shaolei Zhang Ju Fan Meihao Fan Guoliang Li Xiaoyong Du

Abstract
Autonomous data science, from raw data sources to analyst-grade deep researchreports, has been a long-standing challenge, and is now becoming feasible withthe emergence of powerful large language models (LLMs). Recent workflow-baseddata agents have shown promising results on specific data tasks but remainfundamentally limited in achieving fully autonomous data science due to theirreliance on predefined workflows. In this paper, we introduce DeepAnalyze-8B,the first agentic LLM designed for autonomous data science, capable ofautomatically completing the end-toend pipeline from data sources toanalyst-grade deep research reports. To tackle high-complexity data sciencetasks, we propose a curriculum-based agentic training paradigm that emulatesthe learning trajectory of human data scientists, enabling LLMs toprogressively acquire and integrate multiple capabilities in real-worldenvironments. We also introduce a data-grounded trajectory synthesis frameworkthat constructs high-quality training data. Through agentic training,DeepAnalyze learns to perform a broad spectrum of data tasks, ranging from dataquestion answering and specialized analytical tasks to open-ended dataresearch. Experiments demonstrate that, with only 8B parameters, DeepAnalyzeoutperforms previous workflow-based agents built on most advanced proprietaryLLMs. The model, code, and training data of DeepAnalyze are open-sourced,paving the way toward autonomous data science.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.