Nemotron 3.5 Content Safety
NVIDIA has released Nemotron 3.5 Content Safety, a 4B-parameter model designed to unify multimodal evaluation, multilingual coverage, custom policy enforcement, and auditable reasoning within a single inference pipeline. The update addresses enterprise moderation gaps by evaluating text, images, and assistant responses simultaneously, capturing violations that only emerge from cross-modal or conversational interactions. The model supports twelve explicitly trained languages, including English, Chinese, Arabic, and Hindi, while leveraging zero-shot generalization across approximately 140 languages through its Gemma 3 base architecture. A key advancement is custom policy enforcement, allowing organizations to inject domain-specific safety guidelines directly into the prompt. The system dynamically applies these rules to accommodate varying risk profiles across healthcare, finance, and education. An optional THINK mode generates concise reasoning traces before issuing a verdict, providing auditable justification for compliance. Latency-sensitive deployments can disable THINK mode for rapid binary classification. Built on Google Gemma 3 4B and fine-tuned with a LoRA adapter, Nemotron 3.5 maintains a compact footprint for real-time deployment on GPUs with 8GB of VRAM or higher. It supports a 128K context window and offers three output configurations: binary verdict, binary verdict with category tagging, and full reasoning mode. The training dataset combines multilingual text, human-annotated multimodal samples, and professional content examples. Notably, 99 percent of training images are real photographs rather than synthetic generations, addressing adversarial blind spots in existing benchmarks. Reasoning traces were synthesized using larger Qwen teacher models and compressed to minimize inference overhead. Benchmarked across Aegis, RTP-LX, VLGuard, and other industry standards, the model achieves approximately 85 percent average accuracy across multimodal and multilingual safety tests, reaching 96.5 percent on Multilingual Aegis and 88.8 percent on RTP-LX. End-to-end latency remains consistent with the previous iteration in standard mode, while THINK mode introduces predictable overhead. Compared to competing reasoning-based safety models, Nemotron 3.5 generates up to 50 percent fewer output tokens during reasoning, significantly reducing computational costs. The model and its training dataset are publicly available under the NVIDIA Open Model License. Developers can deploy it via Hugging Face, vLLM, SGLang, and pre-optimized NVIDIA NIM microservices, alongside third-party inference platforms. By consolidating advanced safety capabilities into a highly efficient architecture, Nemotron 3.5 enables globally scalable, compliant, and auditable content moderation for production AI systems.
