Command Palette
Search for a command to run...
마인드셋의 사슬: 적응형 인지 모드를 통한 추론
마인드셋의 사슬: 적응형 인지 모드를 통한 추론
초록
인간의 문제 해결 능력은 단일한 사고 방식의 반복이 결코 아니다. 여기서 말하는 '사고 방식'이란 명확한 인지 처리 방식을 의미한다. 특정 과제를 해결할 때 우리는 단일한 사고 방식에만 의존하는 것이 아니라, 하나의 해결 과정 내에서 다양한 사고 방식을 통합적으로 활용한다. 그러나 기존의 대규모 언어 모델(LLM)의 추론 방법은 일반적으로 같은 고정된 사고 방식을 모든 단계에 동일하게 적용하는 공통된 함정에 빠져 있다. 이는 동일한 문제를 해결하는 과정에서 각 단계가 본질적으로 다른 사고 방식을 필요로 함을 간과하고 있다는 점에서 문제를 지닌다. 이러한 단일 사고 방식에 대한 고정된 가정은 모델이 더 높은 수준의 지능에 도달하는 것을 방해한다. 이러한 한계를 극복하기 위해, 우리는 단계별로 적응 가능한 사고 방식을 조율할 수 있는 훈련 없이도 활용 가능한 에이전트 기반 프레임워크인 사고 방식의 체인(Chain of Mindset, CoM)을 제안한다. CoM은 추론 과정을 네 가지 기능적으로 다样的 사고 방식으로 분해한다: 공간적(Spatial), 수렴적(Convergent), 발산적(Divergent), 알고리즘적(Algorithmic). 메타에이전트(Meta-Agent)는 추론 상태의 변화에 따라 최적의 사고 방식을 동적으로 선택하며, 양방향 컨텍스트 게이트(Context Gate)는 모듈 간 정보 흐름을 필터링함으로써 효율성과 효과성을 유지한다. 수학, 코드 생성, 과학적 질의응답(QA), 공간적 추론 등 다양한 도전적인 벤치마크 6개에서 수행된 실험 결과, CoM은 최강의 기준 모델 대비 Qwen3-VL-32B-Instruct와 Gemini-2.0-Flash에서 각각 4.96%, 4.72%의 전체 정확도 향상을 기록하며 최고 수준의 성능을 달성했으며, 추론 효율성과의 균형도 잘 유지하였다. 본 연구의 코드는 공개되어 있으며, 아래 링크에서 확인할 수 있다:https://github.com/QuantaAlpha/chain-of-mindset
One-sentence Summary
Researchers from PKU, BJTU, QuantaAlpha, and others propose Chain of Mindset (CoM), a training-free framework that dynamically selects among four cognitive mindsets during reasoning, outperforming baselines by nearly 5% on key benchmarks while maintaining efficiency, advancing LLM problem-solving beyond fixed-mode approaches.
Key Contributions
- We identify a critical limitation in current LLM reasoning: the reliance on a single fixed mindset across all steps, despite human problem-solving requiring dynamic transitions between distinct cognitive modes such as Spatial, Convergent, Divergent, and Algorithmic thinking.
- We introduce Chain of Mindset (CoM), a training-free agentic framework that dynamically selects the optimal mindset per reasoning step via a Meta-Agent and controls cross-module information flow using a bidirectional Context Gate to preserve efficiency and effectiveness.
- CoM achieves state-of-the-art results across six diverse benchmarks, improving overall accuracy by 4.96% and 4.72% over the strongest baselines on Qwen3-VL-32B-Instruct and Gemini-2.0-Flash, while maintaining computational efficiency without model retraining.
Introduction
The authors leverage cognitive science to address a key limitation in LLM reasoning: current methods apply a single fixed mindset across all reasoning steps, despite human problem-solving dynamically switching between distinct cognitive modes. Prior approaches either lock models into one strategy or select a static method at task onset, failing to adapt as subtask demands evolve. Their main contribution is Chain of Mindset (CoM), a training-free agentic framework that dynamically orchestrates four functionally distinct mindsets—Spatial, Convergent, Divergent, and Algorithmic—using a Meta-Agent that selects the optimal mode per step. A bidirectional Context Gate filters information flow to maintain efficiency and reduce interference, enabling state-dependent switching without retraining. CoM achieves state-of-the-art results across six benchmarks, outperforming baselines by up to 4.96% while generalizing across models and preserving computational efficiency.
Dataset
-
The authors evaluate CoM on six benchmarks across four categories, sourced from recent academic datasets and competitions.
-
Mathematical Reasoning includes:
- AIME 2025: 30 problems covering algebra, geometry, combinatorics, and number theory.
- Real-Fermi: 557 Fermi estimation problems requiring order-of-magnitude reasoning.
-
Code Generation uses LiveCodeBench: 182 programming problems from LeetCode, AtCoder, and CodeForces (Jan–May 2025), categorized as 45 Easy, 55 Medium, and 82 Hard.
-
Science QA features GPQA-Diamond: a curated subset of 198 PhD-level questions in physics, chemistry, and biology, selected because non-experts score only ~30% accuracy.
-
Multimodal Reasoning includes:
- MathVision-Mini: 152 multimodal math questions requiring diagram interpretation before symbolic solving.
- MAZE: 200 maze navigation tasks generated via maze-dataset, where models predict final position after executing a given action sequence on a maze image.
-
No training data is used from these benchmarks; they serve strictly for evaluation. All datasets are used in their original or minimally filtered forms as specified by their respective authors. No cropping or metadata construction is mentioned for these test sets.
Method
The authors leverage a three-layer decoupled architecture called Chain of Mindset (CoM) to enable dynamic, multi-modal reasoning in large language models. This framework separates meta-cognitive orchestration from concrete execution, allowing the system to switch between functionally heterogeneous reasoning paradigms—termed “mindsets”—based on intermediate progress and semantic context. The core components are the Meta-Agent, four specialized Mindset modules, and a bidirectional Context Gate that mediates information flow to prevent context pollution.
The Meta-Agent serves as the central controller, responsible for generating cognitive decisions that define which mindset to invoke at each step. It operates in an iterative Plan-Call-Internalize loop: given the current state st=(q,H<t), it selects a mindset mt via policy π, dispatches a corresponding call ct, receives the output and insight, and internalizes the insight to potentially revise its plan. This enables self-correction and adaptive re-planning during complex reasoning trajectories.
As shown in the figure below, the framework dynamically selects the optimal cognitive mode at each step, contrasting with static strategy selection or single-mode reasoning. The four mindsets—Algorithmic, Spatial, Convergent, and Divergent—each encapsulate distinct cognitive strategies and operate within isolated contexts. The Algorithmic Mindset handles precise calculations via a generate-execute-repair loop, iterating up to Nmax=2 times to fix code errors:
(ρi+1,ralgo)=⎩⎨⎧(ρi,EXEC(ρi))(FIX(ρi,ϵi),⊥)(ρi,ϵi)if execution succeedsif error ϵi∧i<NmaxotherwiseThe Spatial Mindset bridges abstract logic and visual intuition by generating or editing images via Nano-Banana-Pro, supporting three modes: Text→Image, Image+Text→Image, and Code→Image. Generated artifacts are registered with unique identifiers (e.g., [GEN_001]) for later reference. The Convergent Mindset performs focused, deep reasoning grounded in established facts, producing a complete logical derivation. The Divergent Mindset breaks deadlocks by generating k∈[2,5] candidate solution branches, each analyzed in parallel, with results returned to the Meta-Agent for deliberation.
To address the Relevance-Redundancy Trade-off in modular reasoning, the Context Gate implements bidirectional semantic filtering. The Input Gate extracts a minimal sufficient context subset Hrel and relevant images Iini from the full history H, using the call instruction c as a semantic anchor:
(Hrel,Iini)=Gin(H,c,M,I)The Output Gate distills verbose mindset outputs r into concise summaries Osum aligned with the instruction’s goal:
Osum=Gout(r,c,Inew)This ensures high information density in both directions, enabling efficient execution within isolated mindset environments while preserving the compactness of the main reasoning chain. The Meta-Agent internalizes these distilled insights to guide subsequent decisions, forming a closed-loop cognitive architecture capable of dynamic adaptation.
Experiment
- CoM demonstrates adaptive mindset switching for complex reasoning, using Spatial for visual grounding, Convergent for ambiguity resolution, and Algorithmic for precise computation.
- It outperforms baselines across multiple benchmarks, showing strongest gains on tasks requiring flexible strategy adaptation, such as Fermi estimation and spatial reasoning.
- Ablation studies confirm the Context Gate is critical for coordination, while Divergent and Spatial mindsets drive performance on mathematical and visual tasks, respectively.
- CoM achieves state-of-the-art accuracy with moderate computational cost, placing it on the Pareto frontier of accuracy-efficiency trade-offs.
- Mindset invocation patterns reveal task-specific collaboration: Fermi and code tasks favor Algorithmic-Convergent pairs, while multimodal tasks heavily rely on Spatial reasoning.
- Dynamic re-planning is a core strength—CoM revises its strategy mid-process based on intermediate insights, enabling more efficient problem solving than static meta-reasoning approaches.
The authors use CoM to dynamically switch between cognitive mindsets during reasoning, achieving the highest overall accuracy across multiple benchmarks by adapting strategies based on problem context. Results show that removing key components like the Context Gate or Divergent mindset significantly degrades performance, confirming that coordinated, adaptive reasoning is essential for complex tasks. CoM also demonstrates efficiency gains by invoking only necessary mindsets per task, balancing accuracy with computational cost.

The authors use CoM to dynamically switch between cognitive mindsets during reasoning, achieving the highest overall accuracy across multiple benchmarks compared to both direct and meta-reasoning baselines. Results show that CoM particularly excels on tasks requiring flexible strategy adaptation, such as mathematical reasoning and multimodal spatial problems, while maintaining efficiency in token usage. The framework’s strength lies in its ability to coordinate specialized reasoning modes—like Algorithmic for computation and Spatial for visualization—based on problem context rather than relying on fixed or uniform strategies.

The authors use CoM to dynamically switch between cognitive mindsets during reasoning, achieving the highest overall accuracy across multiple benchmarks compared to both direct and meta-reasoning baselines. Results show that CoM particularly excels on tasks requiring flexible strategy adaptation, such as mathematical reasoning and multimodal spatial problems, while maintaining efficiency in token usage. The method’s strength lies in its ability to coordinate specialized reasoning modes—like Algorithmic for computation and Spatial for visualization—based on problem context rather than relying on fixed or uniform strategies.

The authors use CoM to dynamically switch between cognitive mindsets—Divergent, Convergent, Algorithmic, and Spatial—based on problem requirements, with most tasks invoking multiple mindsets to achieve optimal reasoning. Results show that Algorithmic and Convergent mindsets are most frequently engaged overall, while Spatial is critical for visual tasks like MAZE and MathVision, and Divergent plays a key role in mathematical reasoning such as AIME25. The multi-mindset collaboration enables adaptive problem-solving, with 59.7% of problems requiring at least two distinct modes for effective resolution.
