HyperAIHyperAI

Command Palette

Search for a command to run...

2 months ago

Baichuan-M2: Scaling Medical Capability with Large Verifier System

Baichuan-M2: Scaling Medical Capability with Large Verifier System

Abstract

As large language models (LLMs) advance in conversational and reasoningcapabilities, their practical application in healthcare has become a criticalresearch focus. However, there is a notable gap between the performance ofmedical LLMs on static benchmarks such as USMLE and their utility in real-worldclinical decision-making. This discrepancy arises because traditional examsfail to capture the dynamic, interactive nature of medical consultations. Toaddress this challenge, we introduce a novel dynamic verification frameworkthat moves beyond static answer verifier, establishing a large-scale,high-fidelity interactive reinforcement learning system. Our frameworkcomprises two key components: a Patient Simulator that creates realisticclinical environments using de-identified medical records, and a ClinicalRubrics Generator that dynamically produces multi-dimensional evaluationmetrics. Building on this foundation, we develop Baichuan-M2, a 32B-parametermedical augmented reasoning model trained through a multi-stage reinforcementlearning strategy with an improved Group Relative Policy Optimization (GRPO)algorithm. Evaluated on HealthBench, Baichuan-M2 outperforms all otheropen-source models and most advanced closed-source counterparts, achieving ascore above 32 on the challenging HealthBench Hard benchmark-previouslyexceeded only by GPT-5. Our work demonstrates that robust dynamic verifiersystem is essential for aligning LLM capabilities with practical clinicalapplications, establishing a new Pareto front in the performance-parametertrade-off for medical AI deployment.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp