HyperAI

Jailbreaking

Jailbreaking can be defined as a way to break the ethical safeguards of AI models such as ChatGPT.It is with the help of certain specific text prompts that content review guidelines can be easily bypassed, allowing artificial intelligence programs to be free from any restrictions. For example, AI models like ChatGPT can answer questions that are not allowed under normal circumstances. These specific prompts are also called "jailbreaking".

Jailbreaking Threats to LLM

  • Static Data - The first limitation of LLM is that it is trained on static data. For example, ChatGPT is trained using data as of September 2021, so it does not have access to any newer information. LLM models can be trained using new datasets, but this is not an automatic process and it needs to be updated regularly.
  • Personal Information Exposure – Another threat of LLMs is that they may use prompts to learn and enhance AI models. As of now, LLMs are trained using a certain amount of data and then used to answer user queries. These query data are not currently used for training datasets, but it is worrying that the queries/prompts can be seen by LLM providers. Since these query data are stored, there is always a possibility that user data can be used to train the model. These privacy issues must be thoroughly examined before using LLMs.
  • Generating inappropriate content – LLM models can generate incorrect facts and toxic content (using jailbreaks). There is also a risk of “hint word attacks”, which can be used to trick AI models into identifying vulnerabilities in open source code or creating phishing websites.
  • Creating malware and cyber attacks – Another issue is the creation of malware with the help of LLM-based models such as ChatGPT. People with poor technical skills can use LLM to create malware. Criminals can also use LLM to get technical advice related to cyber attacks. Similarly, jailbreak tips can be used to bypass restrictions and create malware.

References

【1】https://www.techopedia.com/what-is-jailbreaking-in-ai-models-like-chatgpt