HyperAIHyperAI

Command Palette

Search for a command to run...

Star Attention Block Sparse Attention Mechanism

Date

10 months ago

Star Attention is a block sparse attention mechanism proposed by NVIDIA in 2024, designed to improve the reasoning efficiency of Transformer-based large language models (LLMs) on long sequences. This mechanism significantly improves the reasoning speed through a two-stage processing flow, and optimizes the use of computing resources while maintaining high accuracy.

The relevant paper results areStar Attention: Efficient LLM Inference over Long Sequences", the paper details the working principle and advantages of Star Attention, including its operation in 2 stages: the first stage is context encoding, and the second stage is query processing and token generation. Star Attention can significantly reduce inference time, reducing memory requirements and inference time by up to 11 times while maintaining 95-100% accuracy.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Star Attention Block Sparse Attention Mechanism | Wiki | HyperAI