HyperAIHyperAI

Command Palette

Search for a command to run...

Tongyi DeepResearch Unveiled: How Synthetic Data and Agentic Training Power a Smaller Yet Smarter AI Model

What does the future of large language model (LLM) training look like? With the recent release of Tongyi DeepResearch by Alibaba’s Tongyi Lab, a new paradigm is emerging—one centered on synthesized data and agentic reasoning. This open-source research model has outperformed both OpenAI’s o3 and DeepResearch on a range of complex tasks, despite being significantly smaller: it has only 30 billion total parameters, with just 3 billion activated per token. In contrast, its open-source competitors—DeepSeek v3.1 (671B parameters) and Kimi Researcher (based on Kimi v2, with 1 trillion parameters)—are vastly larger. So how did Tongyi DeepResearch achieve such strong performance with a fraction of the size? The answer lies not in sheer scale, but in a sophisticated training approach built on synthetic data and advanced agentic reasoning. At its core, Tongyi DeepResearch extends the classic ReAct (Reasoning + Action) framework into what the team calls the Iterative Deep Research Paradigm. This enables the model to perform deep, multi-step investigations by generating and refining its own reasoning trajectories during training. A key innovation is the AgentFounder training schema, introduced in the paper Scaling Agents via Continual Pre-training. This two-stage process begins with Stage I pretraining using a 32K context length, followed by Stage II with a much longer 128K context window. This allows the model to handle extended reasoning chains and complex, multi-turn tasks more effectively. To generate high-quality training data, the team employed two novel synthesis methods: First-Order Action Synthesis (FAS) and Higher-Order Action Synthesis (HAS). FAS moves beyond traditional knowledge representations like “Paris is the capital of France” by anchoring facts to entities—such as “France: Tourist arrivals in France reached 4,222 thousand in June 2025.” This entity-anchored approach creates a more dynamic and diverse question-answering dataset, enabling richer reasoning. HAS takes this further by generating multiple reasoning and action candidates at each step using large language models. Instead of relying on a single, fixed trajectory, the model explores different decision paths during training, enhancing its ability to adapt and reason flexibly—without altering the final binary outcome. This method effectively simulates diverse problem-solving strategies, improving robustness and depth. These techniques are supported by broader research from Alibaba’s team, including Webshaper, which formalizes information-seeking behavior into structured agent trajectories, and WebSailor-V2, which uses synthetic data and scalable reinforcement learning to close the gap between open and proprietary agents. Another key paper, WebSailor, demonstrates how agents can achieve super-human reasoning through iterative exploration and refinement. The success of Tongyi DeepResearch signals a shift in LLM development: the future is not just about bigger models, but smarter training. By leveraging synthetic, agent-generated data and iterative reasoning frameworks, smaller models can match or surpass much larger ones. This marks the dawn of a new era—one where the quality and structure of training data, not just model size, define AI capabilities.

Related Links