China Unveils SpikingBrain-1.0: A Breakthrough in Brain-Inspired AI
Recently, researchers from the Institute of Automation, Chinese Academy of Sciences, led by Li Guoqi and Xu Bo, in collaboration with Muxi MetaX, have unveiled SpikingBrain-1.0, a brain-inspired spiking large model based on the original theory of endogenous complexity. The model was trained and deployed entirely on a domestic thousand-card GPU computing platform, achieving a magnitude-level improvement in efficiency and speed for ultra-long sequence reasoning. This milestone demonstrates the feasibility of building a domestic, independent, and controllable new-generation non-Transformer large model architecture ecosystem. Building on their foundational theoretical work, the research team introduced a novel approach to artificial intelligence: instead of relying on the traditional “exogenous complexity” paradigm—where model intelligence is enhanced by scaling up network size, computational resources, and data volume—the team developed a “endogenous complexity” framework inspired by the intrinsic dynamics of biological neurons. This new paradigm replaces the simplistic point-neuron units in existing Transformer models with spiking neurons that exhibit complex internal dynamics, offering a fundamentally different path to intelligent systems. The team established a theoretical link between the spiking neuron’s intrinsic dynamics and linear attention mechanisms, revealing that current linear attention models are merely special simplified forms of dendritic computation. This insight opens a clear and scalable route to progressively increasing model complexity and performance. Based on this framework, they developed and open-sourced SpikingBrain-1.0-7B, a model with linear complexity, and SpikingBrain-1.0-76B, a hybrid model with 12 billion activated parameters, both designed for high efficiency and scalability. To support deployment on domestic hardware, the team developed a full-stack solution including a training and inference framework tailored for the Muxi MetaX Xiyun C550 GPU cluster, a Triton operator library, model parallelization strategies, and cluster communication primitives. This marks a major step toward a self-reliant, domestically supported AI infrastructure. SpikingBrain-1.0 achieves breakthroughs across multiple dimensions. First, it enables highly efficient training with minimal data: training complexity is linear or near-linear, allowing the model to achieve performance comparable to many open-source Transformer models on multi-task language understanding (MMLU), Chinese multi-task understanding (CMMLU, Ceval), and commonsense reasoning (ARC, HS) tasks using only about 2% of the pretraining data typically required by mainstream models. Second, the model delivers a magnitude-level improvement in inference efficiency. Leveraging the event-driven nature of spiking neurons, SpikingBrain achieves constant or partially constant complexity and memory usage during inference. The SpikingBrain-7B model demonstrates a 26.5x speedup in time-to-first-token (TTFT) at 1 million tokens and over 100x speedup at 4 million tokens compared to Transformer-based models. On mobile CPUs, it achieves 4.04x to 15.39x faster decoding speeds at sequence lengths of 64k, 128k, and 256k, respectively, compared to Llama3.2 of similar scale. Third, the project establishes the foundation for a domestic, independent, non-Transformer large model ecosystem by fully supporting the entire training and inference pipeline on Chinese-made GPU hardware. Fourth, the team introduced a multi-scale sparse mechanism based on dynamic threshold spiking, combining a two-stage fine-grained dynamic threshold strategy with a coarse-grained Mixture-of-Experts (MoE) design. This enables over 69.15% sparsity in the 7B model, with only about 1.85% of spikes generated in long sequences—critical for low-power, brain-inspired computing. This is the first time China has proposed a large-scale brain-inspired linear foundational model architecture and successfully implemented training and inference of such a model on a domestic GPU cluster. The model overcomes the performance degradation challenges in large-scale spiking networks and shows exceptional potential in ultra-long sequence tasks such as legal and medical document analysis, complex multi-agent simulations, high-energy physics experiments, DNA sequence modeling, and molecular dynamics trajectory prediction. The release of SpikingBrain-1.0 offers a new technological pathway beyond the Transformer architecture for next-generation AI and is expected to inspire future research in low-power neuromorphic computing theory and chip design. For further details, please refer to the technical reports and resources: 1) Online demo access 2) Chinese technical report 3) English technical report 4) Model source code