CAS Introduces AutoThink: Large Models Capable of Autonomous Thinking Mode Switching
Efficient Inference Strategy AutoThink: Enabling Large Models to Decide When to Think Deeply As large models continue to advance rapidly, more and more of them are developing "deep thinking capabilities." For instance, the DeepSeek-R1 series introduces a unique prompt structure where models first engage in a "" phase followed by an "" phase. This approach generates a detailed reasoning process that includes self-reflection and validation before providing the final answer, significantly enhancing the model's ability to solve complex problems. However, it also leads to an issue known as "overthinking," where models produce excessive reasoning steps even for simple tasks. For example, when asked "What is 2+3?" the model might start from the definition of natural numbers, list out the commutative property of addition, and repeatedly test different solutions before finally confirming that the answer is 5. This unnecessary overthinking is a widespread problem in reasoning models. To address this, researchers from the Chinese Academy of Sciences' Institute of Automation and the Pengcheng Laboratory have developed an efficient inference strategy called AutoThink. AutoThink enables large models to autonomously switch their thinking modes based on the difficulty of the task. This is achieved through a combination of specially designed prompts and multi-stage reinforcement learning, guiding the model to determine whether deep thinking is necessary. AutoThink’s approach involves two core technologies: Minimal Prompt Intervention: By using an "Ellipsis Prompt" (a prompt with added ellipses), the model is activated to randomly switch between thinking modes. This minimal intervention ensures that the model can decide whether to engage in in-depth reasoning or not without heavy prompting. Multi-Stage Reinforcement Learning: This involves a three-stage training process. The first stage stabilizes both fast and slow thinking modes within the model, where "fast thinking" is used for simple problems and "slow thinking" for complex ones. The second stage optimizes the model's behavior in both modes to improve accuracy. Finally, the third stage refines the model’s outputs to make the reasoning process more concise and effective. After undergoing this multi-stage training, the model no longer randomly decides whether to think deeply. Instead, it autonomously selects the thinking mode based on the complexity of the question, much like humans do. Simple questions are answered directly, while complex questions undergo thorough reasoning. This "on-demand thinking" approach makes the model more efficient and practical. Compared to traditional methods, which often require manual control over the thinking mode or do not differentiate between question difficulty, AutoThink stands out. Traditional approaches either rely on fixed, simple reasoning methods that compress the inference process or employ extensive but redundant reasoning for every task. In contrast, AutoThink automatically adapts its thinking mode based on the problem's difficulty. The research team validated AutoThink on multiple mathematical benchmarks and base models (R1-Style). Their experiments showed that AutoThink not only improves the performance of R1 distilled baseline models but also reduces the number of inference tokens used by about 40%. Unlike most open-source models, which see performance gains at the cost of significantly longer reasoning processes, succinct thinking models show little to no improvement and sometimes even a decline in performance compared to the baseline. Notably, even on the heavily trained DeepScaleR model, AutoThink managed to save an additional 10% of token consumption. AutoThink offers a new paradigm in inference by combining the ellipsis prompt with a three-stage reinforcement learning protocol. This strategy moves away from the one-size-fits-all approach, allowing models to think deeply only when necessary and to express themselves more concisely. On various mathematical datasets, AutoThink has demonstrated an excellent balance between accuracy and efficiency, both enhancing performance and conserving computational resources. This showcases the strong adaptability and practicality of the approach. The AutoThink method has been integrated into ScienceOne, a comprehensive platform for smart scientific research, and will be used to train the foundation model S1-Base for ScienceOne. According to the research team, making large models "think smarter and express more succinctly" is a crucial direction for the future development of scientific foundation models. The researchers hope that this innovation will pave the way for more intelligent and efficient AI systems. For those interested in delving deeper, the paper, code, and model links are provided for further exploration.