Falcon H1R 7B: A Compact Yet Powerful Reasoning Model Outperforms Larger AI Systems with Exceptional Efficiency
Introducing Falcon H1R 7B, a decoder-only large language model developed by the Technology Innovation Institute (TII) in Abu Dhabi. Built on the Falcon-H1 Base model, Falcon H1R 7B represents a major advancement in reasoning capabilities, delivering state-of-the-art performance despite its compact 7 billion parameters. The model outperforms or matches significantly larger models—up to 7 times bigger—across a wide range of reasoning benchmarks, demonstrating exceptional parameter efficiency. Falcon H1R 7B’s success stems from a two-stage training pipeline: supervised fine-tuning followed by reinforcement learning scaling, both powered by a carefully curated dataset. The model is designed around three core dimensions of reasoning efficiency—speed, token efficiency, and accuracy—defining what the team calls the “3-D limits” of performance. A key innovation is the integration of Deep Think with Confidence (DeepConf), a lightweight, confidence-aware method used during test-time scaling. This approach dynamically filters out low-quality reasoning paths using the model’s own confidence scores, improving accuracy without additional training or tuning. In math benchmarks, Falcon H1R 7B achieves top-tier results. It scores 88.1% on AIME-24, surpassing the 15B Apriel 1.5 model (86.2%), and 83.1% on AIME-25, outperforming the 15B Apriel 1.5 (80.0%). On HMMT-25, it reaches 64.9%, beating the 15B Apriel 1.5 (61.0%), and scores 36.3% on AMO-Bench, far ahead of the 8B Qwen3-8B (23.3%). In code and agentic tasks, Falcon H1R 7B excels. It achieves 68.6% on LCB v6, the highest among all models, outperforming even the 32B Qwen3. On SciCode (sub-problem), it scores 28.3%, the best for models under 8B. For TB Hard, it reaches 4.96%, second only to the 15B Apriel 1.5 (9.9%) and significantly better than 8B and 32B Qwen3 variants. On general-purpose benchmarks, Falcon H1R 7B holds its own. It scores 61.3% on GPQA-D, matching other 8B models. On MMLU-Pro, it reaches 72.1%, outperforming all 8B models and nearing the performance of 14B and 32B models. It scores 53.4% on IFBench, second only to Apriel (55.8%) and ahead of all other 8B and 32B models, showing strong instruction-following ability at a small scale. Inference performance is a standout feature. Falcon H1R 7B significantly outperforms Qwen3-8B in token throughput per GPU, especially at higher batch sizes. At batch 32, it achieves around 1,000 tokens/s/GPU, rising to 1,500 at batch 64—nearly double Qwen3’s rates. For longer inputs (8k to 16k), Falcon reaches approximately 1,800 tokens/s/GPU, while Qwen3 remains below 900. This efficiency is enabled by its hybrid Transformer–Mamba backbone, which enhances scaling and memory performance. Test-time scaling further amplifies Falcon H1R 7B’s capabilities. With DeepConf, the model generates fewer tokens for the same accuracy, placing it on a new Pareto frontier of performance versus inference cost. The result is a model that delivers high accuracy with lower computational overhead. Falcon H1R 7B is released under the Falcon LLM license, reflecting TII’s commitment to open access and community collaboration. The team welcomes feedback and contributions as they continue to advance efficient, high-performing AI models. For more details, including technical documentation and code, visit the official GitHub repository.
