MedMobile: A mobile-sized language model with expert-level clinical capabilities

Language models (LMs) have demonstrated expert-level reasoning and recallabilities in medicine. However, computational costs and privacy concerns aremounting barriers to wide-scale implementation. We introduce a parsimoniousadaptation of phi-3-mini, MedMobile, a 3.8 billion parameter LM capable ofrunning on a mobile device, for medical applications. We demonstrate thatMedMobile scores 75.7% on the MedQA (USMLE), surpassing the passing mark forphysicians (~60%), and approaching the scores of models 100 times its size. Wesubsequently perform a careful set of ablations, and demonstrate that chain ofthought, ensembling, and fine-tuning lead to the greatest performance gains,while unexpectedly retrieval augmented generation fails to demonstratesignificant improvements