Deep Think with Confidence

Large Language Models (LLMs) have shown great potential in reasoning tasksthrough test-time scaling methods like self-consistency with majority voting.However, this approach often leads to diminishing returns in accuracy and highcomputational overhead. To address these challenges, we introduce Deep Thinkwith Confidence (DeepConf), a simple yet powerful method that enhances bothreasoning efficiency and performance at test time. DeepConf leveragesmodel-internal confidence signals to dynamically filter out low-qualityreasoning traces during or after generation. It requires no additional modeltraining or hyperparameter tuning and can be seamlessly integrated intoexisting serving frameworks. We evaluate DeepConf across a variety of reasoningtasks and the latest open-source models, including Qwen 3 and GPT-OSS series.Notably, on challenging benchmarks such as AIME 2025, DeepConf@512 achieves upto 99.9% accuracy and reduces generated tokens by up to 84.7% compared to fullparallel thinking.