8 months ago

Method/Architecture

Shi Dingfeng Cao Jingyi Chen Qianben Sun Weichen Li Weizhen

Abstract

Agentic tasks, which require multi-step problem solving with autonomy, tooluse, and adaptive reasoning, are becoming increasingly central to theadvancement of NLP and AI. However, existing instruction data lacks toolinteraction, and current agentic benchmarks rely on costly human annotation,limiting their scalability. We introduce \textsc{TaskCraft}, an automatedworkflow for generating difficulty-scalable, multi-tool, and verifiable agentictasks with execution trajectories. TaskCraft expands atomic tasks usingdepth-based and width-based extensions to create structurally andhierarchically complex challenges. Empirical results show that these tasksimprove prompt optimization in the generation workflow and enhance supervisedfine-tuning of agentic foundation models. We present a large-scale syntheticdataset of approximately 36,000 tasks with varying difficulty to support futureresearch on agent tuning and evaluation.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Method/Architecture

Shi Dingfeng Cao Jingyi Chen Qianben Sun Weichen Li Weizhen

Abstract

Agentic tasks, which require multi-step problem solving with autonomy, tooluse, and adaptive reasoning, are becoming increasingly central to theadvancement of NLP and AI. However, existing instruction data lacks toolinteraction, and current agentic benchmarks rely on costly human annotation,limiting their scalability. We introduce \textsc{TaskCraft}, an automatedworkflow for generating difficulty-scalable, multi-tool, and verifiable agentictasks with execution trajectories. TaskCraft expands atomic tasks usingdepth-based and width-based extensions to create structurally andhierarchically complex challenges. Empirical results show that these tasksimprove prompt optimization in the generation workflow and enhance supervisedfine-tuning of agentic foundation models. We present a large-scale syntheticdataset of approximately 36,000 tasks with varying difficulty to support futureresearch on agent tuning and evaluation.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp