5 months ago

Zhenting Wang Qi Chang Hemani Patel Shashank Biju Cheng-En Wu Quan Liu Aolin Ding Alireza Rezazadeh Ankit Shah Yujia Bao

Abstract

We introduce MCP-Bench, a benchmark for evaluating large language models(LLMs) on realistic, multi-step tasks that demand tool use, cross-toolcoordination, precise parameter control, and planning/reasoning for solvingtasks. Built on the Model Context Protocol (MCP), MCP-Bench connects LLMs to 28representative live MCP servers spanning 250 tools across domains such asfinance, traveling, scientific computing, and academic search. Unlike priorAPI-based benchmarks, each MCP server provides a set of complementary toolsdesigned to work together, enabling the construction of authentic, multi-steptasks with rich input-output coupling. Tasks in MCP-Bench test agents' abilityto retrieve relevant tools from fuzzy instructions without explicit tool names,plan multi-hop execution trajectories for complex objectives, ground responsesin intermediate tool outputs, and orchestrate cross-domain workflows -capabilities not adequately evaluated by existing benchmarks that rely onexplicit tool specifications, shallow few-step workflows, and isolated domainoperations. We propose a multi-faceted evaluation framework covering tool-levelschema understanding and usage, trajectory-level planning, and task completion.Experiments on 20 advanced LLMs reveal persistent challenges in MCP-Bench. Codeand data: https://github.com/Accenture/mcp-bench.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

5 months ago

Zhenting Wang Qi Chang Hemani Patel Shashank Biju Cheng-En Wu Quan Liu Aolin Ding Alireza Rezazadeh Ankit Shah Yujia Bao

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

5 months ago

Zhenting Wang Qi Chang Hemani Patel Shashank Biju Cheng-En Wu Quan Liu Aolin Ding Alireza Rezazadeh Ankit Shah Yujia Bao

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Zhenting Wang Qi Chang Hemani Patel Shashank Biju Cheng-En Wu Quan Liu Aolin Ding Alireza Rezazadeh Ankit Shah Yujia Bao1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Zhenting Wang Qi Chang Hemani Patel Shashank Biju Cheng-En Wu Quan Liu Aolin Ding Alireza Rezazadeh Ankit Shah Yujia Bao1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Zhenting Wang Qi Chang Hemani Patel Shashank Biju Cheng-En Wu Quan Liu Aolin Ding Alireza Rezazadeh Ankit Shah Yujia Bao1 more

Abstract

Build AI with AI

HyperAI Newsletters

Zhenting Wang Qi Chang Hemani Patel Shashank Biju Cheng-En Wu Quan Liu Aolin Ding Alireza Rezazadeh Ankit Shah Yujia Bao

Zhenting Wang Qi Chang Hemani Patel Shashank Biju Cheng-En Wu Quan Liu Aolin Ding Alireza Rezazadeh Ankit Shah Yujia Bao

Zhenting Wang Qi Chang Hemani Patel Shashank Biju Cheng-En Wu Quan Liu Aolin Ding Alireza Rezazadeh Ankit Shah Yujia Bao