HyperAI

Multimodal Visualization-of-Thought

Multimodal Visualization-of-Thought (MVoT) is a technology or method proposed by researchers from Microsoft Research, Cambridge University and the Chinese Academy of Sciences in January 2025 that combines multiple sensory modes (such as vision, hearing, touch, language, etc.) to display and understand the thinking process. Related research results were published in the paper "Imagine while Reasoning in Space:
Multimodal Visualization-of-Thought
This technology aims to provide a more intuitive and comprehensive display of thinking, decision-making and information processing through the collaboration of multiple different modalities (such as images, text, sound, action, etc.).