HyperAIHyperAI

Command Palette

Search for a command to run...

IBM Releases Granite 4.1 Large Language Model

IBM has unveiled Granite 4.1, a new family of open-source large language models featuring dense decoder-only architectures in 3 billion, 8 billion, and 30 billion parameter variants. Built on a rigorous four-stage training pipeline, these models were developed from scratch using approximately 15 trillion tokens of curated data. The release emphasizes data quality over scale, resulting in performance that rivals or surpasses significantly larger mixture-of-experts models, all while maintaining an Apache 2.0 license for unrestricted commercial and research use. The pre-training strategy employed a five-phase approach designed to progressively refine model capabilities. Initial phases focused on broad language understanding using general web data. Subsequent stages increased the density of mathematical and coding content to enhance reasoning. A critical mid-training phase utilized high-quality data annealing to blend chain-of-thought reasoning with synthetic instruction data. The final phase introduced long-context training, extending the model's context window to 512,000 tokens through a staged extension process that preserves short-context performance. This architecture supports Grouped Query Attention, Rotary Position Embeddings, and SwiGLU activations to ensure efficiency. Following pre-training, the models underwent supervised fine-tuning on roughly 4.1 million high-quality samples. To ensure data integrity, IBM implemented an LLM-as-Judge framework combined with rule-based filtering to detect and correct hallucinations, factual errors, and structural defects. This pipeline evaluates assistant responses against specific criteria including correctness, instruction following, and naturalness, ensuring only the highest-quality data trains the final models. Further refinement was achieved through a multi-stage reinforcement learning pipeline. This approach utilizes on-policy GRPO with a specific loss function to optimize for diverse capabilities without inducing catastrophic forgetting. The training sequence includes multi-domain reinforcement learning to cover tasks like logic, science, and tool use, followed by Reinforcement Learning from Human Feedback (RLHF) to improve general chat and helpfulness. Additional stages addressed identity calibration and specific math benchmark recovery, addressing performance drops observed in other RL approaches. Benchmark results demonstrate that the 8 billion parameter dense model matches or exceeds the performance of the previous Granite 4.0-H-Small, a 32 billion parameter mixture-of-experts model with 9 billion active parameters. The Granite 4.1 models excel in general knowledge, complex reasoning, and coding tasks. The 30 billion parameter variant leads across all metrics, achieving significant gains in math and code generation. Notably, the family supports efficient inference with FP8 quantization, reducing memory and disk usage by approximately 50% while maintaining precision. Infrastructure-wise, training was conducted on an NVIDIA GB200 NVL72 cluster hosted on CoreWeave, capable of handling the massive token volumes required. The models support twelve languages, including English, Spanish, French, Japanese, and Arabic. By avoiding extended reasoning chains for standard tasks, Granite 4.1 offers predictable latency and stable token usage, making it a highly efficient choice for enterprise applications requiring reliability and cost control. IBM has released the models and code examples to the public, inviting the developer community to adopt and build upon this advanced open-source foundation.

Related Links