HyperAI
Back to Headlines

Wuhan University Researchers Propose New Large Language Model Backdoor Attack Framework EmbedX

6 days ago

According to recent news from Wuhan University, a paper authored by Yan Nan, a 2023 master's student at the National Cybersecurity College, has been accepted for presentation at the 34th USENIX Security Symposium in 2025. The paper, titled "EmbedX: Embedding-Based Cross-Trigger Backdoor Attack Against Large Language Models," was co-authored with advisors Li Yucheng and Chen Jing, both associate professors and corresponding authors, as well as Associate Professor He Kun from the same college. The research also involved collaboration with Associate Professor Wang Xiong from Huazhong University of Science and Technology and Professor Li Bo from the Hong Kong University of Science and Technology. In recent years, large language models (LLMs) such as GPT-4 and LLaMA have shown remarkable performance in natural language processing tasks, including question answering, translation, and text generation. However, these models are not immune to security risks, particularly backdoor attacks. Attackers can implant specific trigger words during the training process, causing the model to produce malicious or incorrect responses when those triggers are encountered. Traditional backdoor methods, which rely on discrete trigger words, lack the capability to optimize automatically, making it difficult to find the most effective triggers for specific tasks. Furthermore, these methods are often limited to single trigger words, which may not adapt to diverse user language patterns and can become ineffective in cross-cultural and multilingual environments, necessitating retraining and embedding new backdoors—a process that is both inefficient and less covert. To address these limitations, the authors introduced EmbedX, an innovative framework for cross-trigger backdoor attacks based on embedding space. Unlike existing methods, EmbedX uses continuous embedding vectors to create "soft triggers" that can dynamically adjust and align with target outputs. This approach allows for the customization of triggers according to specific backdoor scenarios. Additionally, EmbedX leverages multiple tokens with varying linguistic styles, aligning them in the embedding semantic space to map to the same soft trigger. This ensures that different tokens can activate the same backdoor response, enhancing versatility and adapting to diverse languages and cultures. To improve the attack's stealth, EmbedX incorporates double constraints in the frequency domain and gradient space. These constraints make the poisoned samples appear more similar to normal ones in the model's latent space, reducing the likelihood of detection. The researchers tested EmbedX on several mainstream open-source large language models, including LLaMA, BLOOM, and Gemma, across six different language environments. The tasks included sentiment analysis, hate speech detection, and instruction generation. The results showed that EmbedX outperformed existing methods in terms of attack success rate, time efficiency, and stealth: it achieved near-perfect success rates, required an average of only 0.53 seconds to migrate triggers without retraining, and even improved model accuracy by 3.2%. The acceptance of this paper by USENIX Security highlights the significance of the research. The 34th USENIX Security Symposium, scheduled to take place in Seattle from August 13 to 15, 2025, is a prestigious event in the field of cybersecurity. Since its first edition in 1990, USENIX Security has established itself as one of the top four international academic conferences in cybersecurity, alongside IEEE S&P, ACM CCS, and NDSS. It is also a Class A conference recommended by the Chinese Computer Federation (CCF). This study underscores the potential vulnerabilities of LLMs at the semantic level and provides a theoretical foundation for developing more efficient and covert backdoor detection techniques in the future.

Related Links