HyperAIHyperAI

Command Palette

Search for a command to run...

14 days ago

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Shaolei Zhang Ju Fan Meihao Fan Guoliang Li Xiaoyong Du

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Abstract

Autonomous data science, from raw data sources to analyst-grade deep researchreports, has been a long-standing challenge, and is now becoming feasible withthe emergence of powerful large language models (LLMs). Recent workflow-baseddata agents have shown promising results on specific data tasks but remainfundamentally limited in achieving fully autonomous data science due to theirreliance on predefined workflows. In this paper, we introduce DeepAnalyze-8B,the first agentic LLM designed for autonomous data science, capable ofautomatically completing the end-toend pipeline from data sources toanalyst-grade deep research reports. To tackle high-complexity data sciencetasks, we propose a curriculum-based agentic training paradigm that emulatesthe learning trajectory of human data scientists, enabling LLMs toprogressively acquire and integrate multiple capabilities in real-worldenvironments. We also introduce a data-grounded trajectory synthesis frameworkthat constructs high-quality training data. Through agentic training,DeepAnalyze learns to perform a broad spectrum of data tasks, ranging from dataquestion answering and specialized analytical tasks to open-ended dataresearch. Experiments demonstrate that, with only 8B parameters, DeepAnalyzeoutperforms previous workflow-based agents built on most advanced proprietaryLLMs. The model, code, and training data of DeepAnalyze are open-sourced,paving the way toward autonomous data science.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science | Papers | HyperAI