HyperAI

The UltraSafety dataset was jointly created by Renmin University, Tsinghua University, and Tencent to evaluate and improve the security of large models. UltraSafety exports 1,000 safe seed instructions from AdvBench and MaliciousInstruct, and uses Self-Instruct to guide another 2,000 instructions. The research team manually screened the jailbreak prompts in AutoDAN and finally screened out 830 high-quality jailbreak prompts. UltraSafety contains a total of 3,000 harmful instructions, each with a related jailbreak prompt. Each harmful instruction corresponds to the completion result generated by our model at different security levels, and is accompanied by a rating specified by GPT4, where a rating of 1 means harmless and a rating of 0 means harmful. The UltraSafety dataset aims to assist researchers in training models that can identify and prevent potential security threats through these detailed security-related instructions.

UltraSafety Large Model Safety Evaluation Dataset

Build AI with AI

Hyper Newsletters

Command Palette

UltraSafety Large Model Safety Evaluation Dataset

Build AI with AI

Hyper Newsletters