Command Palette
Search for a command to run...
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Jiaru Zou Soumya Roy Vinay Kumar Verma Ziyi Wang David Wipf Pan Lu Sumit Negi James Zou Jingrui He

Abstract
Process Reward Models (PRMs) have recently emerged as a powerful frameworkfor enhancing the reasoning capabilities of large reasoning models (LRMs),particularly in the context of test-time scaling (TTS). However, theirpotential for supervising LRMs on tabular reasoning domains remainsunderexplored. Through detailed empirical analyses, we identify that existingPRMs, though widely adopted for supervising text-only reasoning steps, strugglewith table-specific operations such as sub-table retrieval and schemainteraction, leading to critical performance bottlenecks. To address thislimitation, we propose TaTToo, a novel table-grounded PRM framework that (i)reasons explicitly over tabular reasoning steps and (ii) integrates tool-basedverification to provide precise reward supervision. Concretely, we first designa scalable data curation pipeline that constructs over 60k high-qualitystep-level annotations by integrating table verification rationales withtool-based executions. Building on the collected data, we train TaTToo with adual-stage paradigm: cold-start supervised fine-tuning to capture tool-usereasoning patterns, followed by reinforcement learning with tool-groundedreward shaping to align our model with table-based verification. We provide acomprehensive evaluation of the policy improvement induced by our newlydesigned PRM. Across 5 challenging tabular reasoning benchmarks coveringnumerical reasoning, fact-checking, and data analysis, TaTToo improvesdownstream policy LRMs by 30.9% at inference, surpasses strong PRM baselinessuch as Qwen-2.5-Math-PRM-72B with only 8B parameters, and demonstrates stronggeneralizability across diverse TTS strategies.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.