Command Palette
Search for a command to run...
Khalil Hennara Muhammad Hreden Mohamed Motasim Hamed Ahmad Bastati Zeina Aldallal Sara Chrouf Safwan AlModhayan

Abstract
Arabic document OCR remains a challenging task due to the language's cursivescript, diverse fonts, diacritics, and right-to-left orientation. While modernMultimodal Large Language Models (MLLMs) have advanced document understandingfor high-resource languages, their performance on Arabic remains limited. Inthis work, we introduce Baseer, a vision-language model fine- tunedspecifically for Arabic document OCR. Leveraging a large-scale datasetcombining synthetic and real-world documents, Baseer is trained using adecoder-only fine-tuning strategy to adapt a pre-trained MLLM while preservinggeneral visual features. We also present Misraj-DocOCR, a high-quality,expert-verified benchmark designed for rigorous evaluation of Arabic OCRsystems. Our experiments show that Baseer significantly outperforms existingopen-source and commercial solutions, achieving a WER of 0.25 and establishinga new state-of-the-art in the domain of Arabic document OCR. Our resultshighlight the benefits of domain-specific adaptation of general-purpose MLLMsand establish a strong baseline for high-accuracy OCR on morphologically richlanguages like Arabic.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.