Magistral

We introduce Magistral, Mistral's first reasoning model and our own scalablereinforcement learning (RL) pipeline. Instead of relying on existingimplementations and RL traces distilled from prior models, we follow a groundup approach, relying solely on our own models and infrastructure. Notably, wedemonstrate a stack that enabled us to explore the limits of pure RL trainingof LLMs, present a simple method to force the reasoning language of the model,and show that RL on text data alone maintains most of the initial checkpoint'scapabilities. We find that RL on text maintains or improves multimodalunderstanding, instruction following and function calling. We present MagistralMedium, trained for reasoning on top of Mistral Medium 3 with RL alone, and weopen-source Magistral Small (Apache 2.0) which further includes cold-start datafrom Magistral Medium.