HyperAI
Back to Headlines

Baidu Open-Sources ERNIE 4.5: Advanced Language Models from 0.3B to 424B Parameters Now Available on Hugging Face

15 days ago

Baidu has officially open-sourced its latest ERNIE 4.5 series, a family of foundation models designed to enhance language understanding, reasoning, and generation. This release includes ten model variants, ranging from compact models with 0.3 billion parameters to massive Mixture-of-Experts (MoE) architectures with up to 424 billion parameters. These models are now freely available to the global research and developer community via Hugging Face, fostering open experimentation and broader access to cutting-edge Chinese and multilingual language technology. Technical Overview of ERNIE 4.5 Architecture ERNIE 4.5 builds on Baidu's previous models by introducing advanced architectures, including both dense and sparsely activated MoE designs. The MoE variants are particularly noteworthy for their efficiency in scaling parameter counts. For example, the ERNIE 4.5-MoE-3B and ERNIE 4.5-MoE-47B models activate only a subset of experts per input token (typically 2 out of 64), ensuring that the number of active parameters remains manageable while preserving the model's expressivity and generalization capabilities. These models are trained using a combination of supervised fine-tuning (SFT), reinforcement learning with human feedback (RLHF), and contrastive alignment techniques. The training data consists of 5.6 trillion tokens across various domains in both Chinese and English, processed through Baidu's proprietary multi-stage pretraining pipeline. As a result, the models perform well in instruction-following, multi-turn conversation, long-form generation, and reasoning benchmarks. Model Variants and Open-Source Release The ERNIE 4.5 release features ten model variants: Dense models: 0.3B, 3.8B, 7B, 20B, 45B, 110B, 130B, 193B MoE models: 3B, 47B, 424B The MoE-47B variant, for example, activates only 3 billion parameters during inference despite having a total of 47 billion. Similarly, the largest model, the 424 billion-parameter variant, uses sparse activation strategies to make inference feasible and scalable. These models support both FP16 and INT8 quantization, enhancing their efficiency for deployment. Performance Benchmarks ERNIE 4.5 models showcase significant improvements in key Chinese and multilingual natural language processing (NLP) tasks. According to Baidu's technical report: Instruction-following tasks benefit from contrastive fine-tuning, leading to better alignment with user intent and lower rates of hallucination compared to earlier ERNIE versions. The models excel in multi-turn conversation, long-form generation, and reasoning benchmarks. Applications and Deployment ERNIE 4.5 models are optimized for a wide range of applications. Some variants support context lengths up to 128K tokens, making them suitable for tasks that require memory and reasoning over long documents or sessions. This versatility enhances their utility in areas such as content creation, customer service, and research. Conclusion The ERNIE 4.5 series marks a significant advancement in open-source AI development. By releasing models that span from lightweight 0.3B-parameter versions to a 424B-parameter MoE model, Baidu demonstrates its commitment to inclusive and transparent AI research. With comprehensive documentation, easy access through Hugging Face, and support for efficient deployment, the ERNIE 4.5 models are poised to drive global progress in natural language understanding and generation. For more details, you can check out the paper and models on Hugging Face. All credit for this research goes to the dedicated team at Baidu. Additionally, feel free to follow us on Twitter and join our 100k+ ML SubReddit to stay updated on the latest advancements in machine learning.

Related Links