HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago
LLM
Generative AI

AI Model Edits Can Leak Sensitive Data Through Update Fingerprints, Researchers Warn

A new study has revealed a significant security vulnerability in large language models (LLMs) related to model editing techniques. Researchers from the Shanghai Qi Zhi Institute, East China Normal University, Tsinghua University, and the Chinese Academy of Sciences have demonstrated that edits made to fix errors or remove sensitive information from LLMs can inadvertently expose confidential data through what they call "update fingerprints." These fingerprints are subtle patterns in the model’s internal parameter changes during editing. Even though model editing—particularly the "locate-then-edit" approach—is designed to fix issues without full retraining, the researchers found that these updates can leak information about the original data that was edited. The team developed a two-stage reverse-engineering attack called KSTER (Key Space Reconstruction-then-Entropy Reduction) that exploits the low-rank structure of parameter updates. In the first stage, the attack uses spectral analysis to recover the subject of the edit by identifying the row space of the update matrix, which acts as a fingerprint. In the second stage, it applies an entropy-based method to reconstruct the semantic context of the original input. The researchers tested KSTER on several prominent LLMs, including GPT-J, Llama-3, and Qwen-2.5, and achieved high success rates in recovering sensitive data that had been edited or removed. This shows that even when users believe their data has been erased, it may still be recoverable through sophisticated attacks. To counter this threat, the team introduced a defense strategy called subspace camouflage. This method adds semantic decoys to the update process, effectively obscuring the fingerprint and making it difficult for attackers to reconstruct the original data. Crucially, the defense maintains the usefulness of the editing process, ensuring that model corrections remain effective. The attack code and defense implementation are publicly available on GitHub, allowing other researchers and developers to study, test, and build upon the findings. The work highlights a critical gap in current LLM security: while editing is faster and more efficient than retraining, it introduces new risks that must be addressed. This research underscores the need for stronger safeguards in AI systems, especially as LLMs are increasingly used in sensitive domains like healthcare, finance, and government. Future work may focus on refining defenses, developing new editing paradigms, and establishing standards to ensure that data deletion is truly irreversible. As AI becomes more embedded in daily life, protecting user privacy through robust, transparent security measures will be essential.

Related Links

AI Model Edits Can Leak Sensitive Data Through Update Fingerprints, Researchers Warn | Trending Stories | HyperAI