HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

Imperceptible Jailbreaking against Large Language Models

Kuofeng Gao Yiming Li Chao Du Xin Wang Xingjun Ma Shu-Tao Xia Tianyu Pang

Imperceptible Jailbreaking against Large Language Models

Abstract

Jailbreaking attacks on the vision modality typically rely on imperceptibleadversarial perturbations, whereas attacks on the textual modality aregenerally assumed to require visible modifications (e.g., non-semanticsuffixes). In this paper, we introduce imperceptible jailbreaks that exploit aclass of Unicode characters called variation selectors. By appending invisiblevariation selectors to malicious questions, the jailbreak prompts appearvisually identical to original malicious questions on screen, while theirtokenization is "secretly" altered. We propose a chain-of-search pipeline togenerate such adversarial suffixes to induce harmful responses. Our experimentsshow that our imperceptible jailbreaks achieve high attack success ratesagainst four aligned LLMs and generalize to prompt injection attacks, allwithout producing any visible modifications in the written prompt. Our code isavailable at https://github.com/sail-sg/imperceptible-jailbreaks.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp