HyperAI

Slow Perception

Slow perception is a technology used in the field of artificial intelligence to improve the visual reasoning ability of large multimodal models. It was jointly proposed by the StepFun team and the Beijing University of Aeronautics and Astronautics in 2025. It aims to achieve fine perception of geometric figures by splitting the perception process, so as to improve the performance of large multimodal models in visual reasoning tasks.Slow Perception: Let's Perceive Geometric Figures Step-by-step".

Slow perception is divided into 2 stages:

  • Perception Decomposition: Decompose geometric figures into basic shape units - lines, unify complex geometric representations, avoid multimodal optimization problems, and achieve the goal of "simplifying the complex". It avoids errors that may occur when the model processes complex geometric figures, such as polygon nesting problems.
  • Perception Flow: The model is based on a virtual perception ruler, which gradually traces the line segment from its initial point to its final point. The perception process of a long line segment is modeled as a process of reaching the next decision point through multiple saccades from a decision point. This introduces reasoning time extension at the perception level to improve the model's ability to accurately predict line segments.

Slow perception significantly improves the model's ability to parse complex geometric figures by simulating the way humans parse geometric figures step by step. This method not only shows significant performance improvement in experiments, but also reveals the law of reasoning time expansion, that is, improving parsing accuracy by increasing computational complexity. This discovery provides new ideas for geometric figure parsing tasks in computer vision.