HyperAI

A new study has revealed a significant security vulnerability in large language models (LLMs) related to model editing techniques. Researchers from the Shanghai Qi Zhi Institute, East China Normal University, Tsinghua University, and the Chinese Academy of Sciences have demonstrated that edits made to fix errors or remove sensitive information from LLMs can inadvertently expose confidential data through what they call "update fingerprints." These fingerprints are subtle patterns in the model’s internal parameter changes during editing. Even though model editing—particularly the "locate-then-edit" approach—is designed to fix issues without full retraining, the researchers found that these updates can leak information about the original data that was edited. The team developed a two-stage reverse-engineering attack called KSTER (Key Space Reconstruction-then-Entropy Reduction) that exploits the low-rank structure of parameter updates. In the first stage, the attack uses spectral analysis to recover the subject of the edit by identifying the row space of the update matrix, which acts as a fingerprint. In the second stage, it applies an entropy-based method to reconstruct the semantic context of the original input. The researchers tested KSTER on several prominent LLMs, including GPT-J, Llama-3, and Qwen-2.5, and achieved high success rates in recovering sensitive data that had been edited or removed. This shows that even when users believe their data has been erased, it may still be recoverable through sophisticated attacks. To counter this threat, the team introduced a defense strategy called subspace camouflage. This method adds semantic decoys to the update process, effectively obscuring the fingerprint and making it difficult for attackers to reconstruct the original data. Crucially, the defense maintains the usefulness of the editing process, ensuring that model corrections remain effective. The attack code and defense implementation are publicly available on GitHub, allowing other researchers and developers to study, test, and build upon the findings. The work highlights a critical gap in current LLM security: while editing is faster and more efficient than retraining, it introduces new risks that must be addressed. This research underscores the need for stronger safeguards in AI systems, especially as LLMs are increasingly used in sensitive domains like healthcare, finance, and government. Future work may focus on refining defenses, developing new editing paradigms, and establishing standards to ensure that data deletion is truly irreversible. As AI becomes more embedded in daily life, protecting user privacy through robust, transparent security measures will be essential.

Related Links

Related Links

Related Links

Materials AI Is Moving Towards an "explainable Era": A Japanese Team Cracks the Black Box of high-dimensional Spectroscopy, Pinpointing Key Features for Discovering New materials.

Materials AI Is Moving Towards an "explainable Era": A Japanese Team Cracks the Black Box of high-dimensional Spectroscopy, Pinpointing Key Features for Discovering New materials.

Command Palette

AI Model Edits Can Leak Sensitive Data Through Update Fingerprints, Researchers Warn

Related Links

Command Palette

AI Model Edits Can Leak Sensitive Data Through Update Fingerprints, Researchers Warn

Related Links

Command Palette

AI Model Edits Can Leak Sensitive Data Through Update Fingerprints, Researchers Warn

Related Links

Materials AI Is Moving Towards an "explainable Era": A Japanese Team Cracks the Black Box of high-dimensional Spectroscopy, Pinpointing Key Features for Discovering New materials.

Materials AI Is Moving Towards an "explainable Era": A Japanese Team Cracks the Black Box of high-dimensional Spectroscopy, Pinpointing Key Features for Discovering New materials.