HyperAIHyperAI

Native Sparse Attention

Native Sparse Attention (NSA) is a native trainable sparse attention mechanism proposed by DeepSeek, Peking University, and the University of Washington on February 27, 2025. It aims to solve the computational bottleneck problem in long sequence modeling. This method combines algorithmic innovation with hardware optimization to achieve efficient long-context modeling.Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention", which won the ACL 25 Best Paper Award.

Pre-trained on a 27B-parameter Transformer backbone model, NSA achieves comparable or better performance than fully connected attention models on common benchmarks, long-context tasks, and inference tasks. When processing 64k-length sequences, NSA achieves significant speedups in decoding, forward propagation, and backpropagation.