HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial
  Search and Reasoning

Abstract

Search has emerged as core infrastructure for LLM-based agents and is widelyviewed as critical on the path toward more general intelligence. Finance is aparticularly demanding proving ground: analysts routinely conduct complex,multi-step searches over time-sensitive, domain-specific data, making it idealfor assessing both search proficiency and knowledge-grounded reasoning. Yet noexisting open financial datasets evaluate data searching capability ofend-to-end agents, largely because constructing realistic, complicated tasksrequires deep financial expertise and time-sensitive data is hard to evaluate.We present FinSearchComp, the first fully open-source agent benchmark forrealistic, open-domain financial search and reasoning. FinSearchComp comprisesthree tasks -- Time-Sensitive Data Fetching, Simple Historical Lookup, andComplex Historical Investigation -- closely reproduce real-world financialanalyst workflows. To ensure difficulty and reliability, we engage 70professional financial experts for annotation and implement a rigorousmulti-stage quality-assurance pipeline. The benchmark includes 635 questionsspanning global and Greater China markets, and we evaluate 21 models (products)on it. Grok 4 (web) tops the global subset, approaching expert-level accuracy.DouBao (web) leads on the Greater China subset. Experimental analyses show thatequipping agents with web search and financial plugins substantially improvesresults on FinSearchComp, and the country origin of models and tools impactperformance significantly.By aligning with realistic analyst tasks andproviding end-to-end evaluation, FinSearchComp offers a professional,high-difficulty testbed for complex financial search and reasoning.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning | Papers | HyperAI