HyperAI

ComplexFuncBench Complex Function Call Evaluation Dataset

Date

4 months ago

Size

5.21 MB

Organization

Tsinghua University

Publish URL

github.com

ComplexFuncBench stands for Complex Function Calling Benchmark, which is a benchmark dataset for evaluating the capabilities of large language models (LLMs) in complex function calling scenarios. The dataset was developed by researchers from Zhipu AI and Tsinghua University in 2025 to fill the gaps in existing benchmarks in terms of multi-step and restricted function calls. The relevant paper results are "ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario".

The dataset covers 1k complex function call samples in 5 real-world scenarios, including 600 single-domain samples, 150 each for hotels, flights, car rentals, and attractions, and 400 cross-domain samples. The taxi domain has only 2 functions, so it is only used for cross-domain. Compared with existing benchmarks, ComplexFuncBench contains multi-step and constrained function calls, requires long parameter archiving, parameter value reasoning, and 128k long context.

ComplexFuncBench.torrent
Seeding 0Downloading 1Completed 18Total Downloads 32
  • ComplexFuncBench/
    • README.md
      1.6 KB
    • README.txt
      3.2 KB
      • data/
        • bench.zip
          5.21 MB