HyperAI

Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model

Hennara, Khalil ; Hreden, Muhammad ; Hamed, Mohamed Motaism ; Aldallal, Zeina ; Chrouf, Sara ; AlModhayan, Safwan
Release Date: 5/27/2025
Mutarjim: Advancing Bidirectional Arabic-English Translation with a
  Small Language Model
Abstract

We introduce Mutarjim, a compact yet powerful language model forbidirectional Arabic-English translation. While large-scale LLMs have shownimpressive progress in natural language processing tasks, including machinetranslation, smaller models. Leveraging this insight, we developed Mutarjimbased on Kuwain-1.5B , a language model tailored for both Arabic and English.Despite its modest size, Mutarjim outperforms much larger models on severalestablished benchmarks, achieved through an optimized two-phase trainingapproach and a carefully curated, high-quality training corpus.. Experimentalresults show that Mutarjim rivals models up to 20 times larger whilesignificantly reducing computational costs and training requirements. We alsointroduce Tarjama-25, a new benchmark designed to overcome limitations inexisting Arabic-English benchmarking datasets, such as domain narrowness, shortsentence lengths, and English-source bias. Tarjama-25 comprises 5,000expert-reviewed sentence pairs and spans a wide range of domains, offering amore comprehensive and balanced evaluation framework. Notably, Mutarjimachieves state-of-the-art performance on the English-to-Arabic task inTarjama-25, surpassing even significantly larger and proprietary models likeGPT-4o mini. We publicly release Tarjama-25 to support future research andadvance the evaluation of Arabic-English translation systems.