HyperAI
Back to Headlines

Apple's Decrypted Generative Model Safety Filters Reveal Key Content Controls

3 days ago

GitHub repository: BlueFalconHD/apple_generative_model_safety_decrypted Overview This repository contains decrypted safety files for Apple's generative models. These files include filters designed to ensure the models comply with safety standards and avoid generating harmful content. Structure Usage To utilize this repository, follow these steps: Python Dependencies: The only required dependency is cryptography, which can be installed via pip: sh pip install cryptography Getting the Encryption Key: To retrieve the encryption key generated by ModelCatalog.Obfuscation.readObfuscatedContents, you need to attach LLDB (a debugger) to the GenerativeExperiencesSafetyInferenceProvider binary located at /System/Library/ExtensionKit/Extensions/GenerativeExperiencesSafetyInferenceProvider.appex/Contents/MacOS/GenerativeExperiencesSafetyInferenceProvider. It is crucial to use Xcode’s version of LLDB, rather than the default macOS or LLVM versions. Here's a recommended method to attach LLDB: sh xcrun lldb --attach-pid <pid_of_GenerativeExperiencesSafetyInferenceProvider> Decrypting the Overrides To decrypt the safety overrides, execute the following command in the root directory of this repository: sh python decrypt_overrides.py - The script will create a directory named decrypted_overrides if it does not already exist and place the decrypted files within it. - This step is only necessary if the overrides have been recently updated; the repository already includes a decrypted version of the overrides as of June 28, 2025. Understanding the Overrides The decrypted overrides are JSON files that contain safety filters for various generative models. Each override is tied to a specific model context and includes rules that dictate how the model should handle certain situations, such as filtering out harmful content or ensuring compliance with safety standards. Example: Metadata JSON File The metadata.json file, sourced from dec_out_repo/decrypted_overrides/com.apple.gm.safety_deny.output.code_intelligence.base, provides a clear illustration of the safety measures in place: json { "reject": [ "xylophone copious opportunity defined elephant 10out", "xylophone copious opportunity defined elephant out" ], "remove": [], "replace": {}, "regexReject": [ "(?i)\\bbitch\\b", "(?i)\\bdago\\b", "(?i)\\bdyke\\b", "(?i)\\bhebe\\b", ... ], "regexRemove": [], "regexReplace": {} } reject: Contains exact phrases that, if detected in the model's output, will trigger a guardrail violation. remove: Lists phrases that will be removed from the model's output. replace: Provides mappings where certain phrases are replaced with others in the output. regexReject: Includes regular expressions for matching and rejecting content. regexRemove: Includes regular expressions for removing content. regexReplace: Includes regular expressions for replacing content. This repository serves as a valuable resource for understanding the safety mechanisms employed by Apple's generative models, particularly in how they manage and filter potentially harmful or inappropriate content.

Related Links