ComplexFuncBench Complex Function Call Evaluation Dataset
Date
Size
Publish URL
Tags
Categories
ComplexFuncBench stands for Complex Function Calling Benchmark, which is a benchmark dataset for evaluating the capabilities of large language models (LLMs) in complex function calling scenarios. The dataset was developed by researchers from Zhipu AI and Tsinghua University in 2025 to fill the gaps in existing benchmarks in terms of multi-step and restricted function calls. The relevant paper results are "ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario".
The dataset covers 1k complex function call samples in 5 real-world scenarios, including 600 single-domain samples, 150 each for hotels, flights, car rentals, and attractions, and 400 cross-domain samples. The taxi domain has only 2 functions, so it is only used for cross-domain. Compared with existing benchmarks, ComplexFuncBench contains multi-step and constrained function calls, requires long parameter archiving, parameter value reasoning, and 128k long context.