Command Palette
Search for a command to run...
Grouped-query Attention (GQA)
Date
2 years ago
Grouped Query Attention (GQA) is a method that interpolates between Multi-Query Attention (MQA) and Multi-Head Attention (MHA) in Large Language Models (LLM).Its goal is to achieve the quality of MHA while maintaining the speed of MQA.
Key attributes of GQA include:
- Interpolation: GQA is an intermediate method between MQA and MHA, which solves the shortcomings of MQA, such as quality degradation and training instability.
- efficiency: GQA optimizes performance while maintaining quality by using an intermediate number of key-value headers.
- trade off: GQA strikes a balance between the speed of MQA and the quality of MHA, providing a favorable trade-off.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.
AI Co-coding
Ready-to-use GPUs
Best Pricing
Hyper Newsletters
Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp