HyperAIHyperAI

Command Palette

Search for a command to run...

FlippedRAG: Black-Box Attack on LLMs Through RAG Manipulation

Recently, a research paper co-authored by Chen Zhuo, a doctoral student from the School of Information Management at Wuhan University, has been accepted for presentation at the 32nd ACM Conference on Computer and Communications Security (ACM CCS 2025). The paper, titled "FlippedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models," was published as the first author’s work. The research was jointly supervised by Professor Lu Wei, Associate Professor Cheng Qikai, Tenure-Track Assistant Professor Zhang Fan, Postdoctoral Researcher Liu Jiawei (corresponding author), and Associate Professor Liu Xiaozhong from Worcester Polytechnic Institute, USA. Other contributors include doctoral student Liu Haotan, master’s student Chen Miao-kun, and undergraduate student Gong Yuyang from Wuhan University. Retrieval-Augmented Generation (RAG) technology enhances large language models by integrating external knowledge bases, helping reduce hallucinations and improve answer accuracy. As RAG systems become increasingly deployed in real-world applications, their reliability and security have drawn growing attention. However, most existing studies on RAG security focus on white-box settings or are limited to factual question-answering tasks, leaving a critical gap in understanding risks under black-box conditions—especially in opinion-driven, controversial scenarios. To address this gap, the paper introduces FlippedRAG, a novel adversarial attack method based on transfer learning. The study reveals a serious security vulnerability in RAG systems: even with minimal malicious manipulation of retrieved documents, attackers can successfully bias model outputs in a black-box setting—without any knowledge of the internal model structure. By constructing a proxy retriever and designing adversarial trigger texts, attackers can corrupt just a small number of documents to steer the model toward generating biased or manipulated opinions. Experiments show that FlippedRAG achieves an average 16.7% higher attack success rate compared to baseline methods, and can cause a 50% shift in the polarity of generated opinions. Furthermore, user studies demonstrate that the attack can induce a 20% significant change in human participants’ perceived viewpoints, highlighting its real-world impact. Importantly, FlippedRAG is effective at evading existing defense mechanisms, overcoming limitations of prior attacks that rely on white-box assumptions or easily detectable heuristic strategies. ACM CCS 2025 is scheduled to take place from October 13 to 17, 2025, in Taipei, China. Recognized alongside IEEE S&P, USENIX Security, and NDSS as one of the four top-tier international conferences in cybersecurity, ACM CCS is also ranked as a Category A conference by the China Computer Federation (CCF). Over the past decade, the conference has maintained an acceptance rate of approximately 18%, reflecting the highest level of cutting-edge research in the field of information security.

Related Links

FlippedRAG: Black-Box Attack on LLMs Through RAG Manipulation | Trending Stories | HyperAI