Trajectory Planning On Toolbench

Win rate

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	Win rate	Paper Title
Attention Bucket	71.5	Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
GPT4-TOPGUN	86.54	SwissNYF: Tool Grounded LLM Agents for Black Box Setting
GPT4- DFSDT	70.4	ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

0 of 3 row(s) selected.