HyperAI
Back to Headlines

Researchers Unveil GPUHammer: New Rowhammer Attack Corrupts AI Models on NVIDIA GDDR6 GPUs

2 days ago

A group of researchers from the University of Toronto has unveiled a new attack method called GPUHammer, which can surreptitiously corrupt AI models running on NVIDIA GPUs by flipping bits in the memory. The attack leverages a known hardware vulnerability called Rowhammer, which has previously been observed in CPUs and system memory (RAM). Rowhammer involves repeatedly accessing a specific row of memory to create electrical interference, leading to bit flips in adjacent rows. These bit flips can alter the data stored in the memory, including the weights of neural networks, thereby degrading or breaking AI models. How GPUHammer Works The researchers demonstrated the impact of GPUHammer on a real NVIDIA RTX A6000 GPU, showing that a single bit flip could reduce an AI model's accuracy from 80% to under 1%. This is achieved by repeatedly hammering memory cells in the GDDR6 VRAM, which is common in many modern NVIDIA GPUs, particularly those used in workstations and servers. The concerning aspect of GPUHammer is that it doesn’t require direct access to the data or code. Instead, an attacker only needs to share the same GPU in a cloud or server environment to execute the attack and disrupt the targeted model. Affected GPUs While the attack was tested on the RTX A6000, the vulnerability extends to a wide range of NVIDIA GPUs, including those based on the Ampere, Ada, Hopper, and Turing architectures. However, newer models like the RTX 5090 and H100 have built-in Error Correction Code (ECC), which can automatically detect and correct these bit flips, offering protection against GPUHammer. Impact and Mitigation The primary targets of GPUHammer are shared GPU environments, such as cloud gaming servers, AI training clusters, and Virtual Desktop Infrastructure (VDI) setups, where multiple users or applications run on the same hardware. In these settings, the attack could lead to severe consequences, including degraded performance, incorrect model outputs, and potential security breaches. NVIDIA has recommended enabling ECC for vulnerable GPUs to mitigate the risk. ECC adds redundancy to memory, allowing it to detect and correct errors caused by bit flips, though it comes with a performance trade-off of about 10% slower machine learning tasks and 6-6.5% less usable VRAM. Enabling ECC To enable ECC, users can use the following NVIDIA command-line tool: sh nvidia-smi -e 1 To verify whether ECC is active, you can run: sh nvidia-smi -q | grep ECC Industry Implications GPUHammer underscores the importance of memory safety in GPUs, especially as they become more integral to AI, creative work, and productivity tasks. In regulated industries such as healthcare, finance, and autonomous driving, the tampering of AI models can lead to critical errors, security issues, and legal liabilities. Even though the average home user is unlikely to face direct threats from GPUHammer, the discovery serves as a wake-up call for the broader tech community. As GPUs continue to evolve and play a larger role in various computing domains, ensuring robust security measures, including memory integrity, is no longer optional. Industry reactions and company profiles Experts in the cybersecurity and AI fields have praised NVIDIA for its swift response and clear mitigation guidelines. Andrew Martin, a cybersecurity professor at Oxford University, noted that the discovery highlights the need for continuous vigilance in hardware security. "This attack is a vivid reminder that memory vulnerabilities extend beyond traditional system memory to GPUs, and it’s crucial for hardware manufacturers and users alike to stay proactive," Martin said. NVIDIA, founded in 1993 and headquartered in Santa Clara, California, is a leader in the graphics processing unit (GPU) market. Known for its powerful GPUs used in gaming, professional visualization, data centers, and AI, NVIDIA has been a pivotal player in advancing GPU technology. The company’s quick action to address GPUHammer demonstrates its commitment to maintaining the integrity and security of its hardware, aligning with the growing demands of the AI and cloud computing industries.

Related Links