HyperAIHyperAI
17 days ago

Capabilities of Gemini Models in Medicine

Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby, Nenad Tomasev, Jan Freyberg, Charles Lau, Jonas Kemp, Jeremy Lai, Shekoofeh Azizi, Kimberly Kanada, SiWai Man, Kavita Kulkarni, Ruoxi Sun, Siamak Shakeri, Luheng He, Ben Caine, Albert Webson, Natasha Latysheva, Melvin Johnson, Philip Mansfield, Jian Lu, Ehud Rivlin, Jesper Anderson, Bradley Green, Renee Wong, Jonathan Krause, Jonathon Shlens, Ewa Dominowska, S. M. Ali Eslami, Claire Cui, Oriol Vinyals, Koray Kavukcuoglu, James Manyika, Jeff Dean, Demis Hassabis, Yossi Matias, Dale Webster, Joelle Barral, Greg Corrado, Christopher Semturs, S. Sara Mahdavi, Juraj Gottweis, Alan Karthikesalingam, Vivek Natarajan
Capabilities of Gemini Models in Medicine
Abstract

Excellence in a wide variety of medical applications poses considerablechallenges for AI, requiring advanced reasoning, access to up-to-date medicalknowledge and understanding of complex multimodal data. Gemini models, withstrong general capabilities in multimodal and long-context reasoning, offerexciting possibilities in medicine. Building on these core strengths of Gemini,we introduce Med-Gemini, a family of highly capable multimodal models that arespecialized in medicine with the ability to seamlessly use web search, and thatcan be efficiently tailored to novel modalities using custom encoders. Weevaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art(SoTA) performance on 10 of them, and surpass the GPT-4 model family on everybenchmark where a direct comparison is viable, often by a wide margin. On thepopular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achievesSoTA performance of 91.1% accuracy, using a novel uncertainty-guided searchstrategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU(health & medicine), Med-Gemini improves over GPT-4V by an average relativemargin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-contextcapabilities through SoTA performance on a needle-in-a-haystack retrieval taskfrom long de-identified health records and medical video question answering,surpassing prior bespoke methods using only in-context learning. Finally,Med-Gemini's performance suggests real-world utility by surpassing humanexperts on tasks such as medical text summarization, alongside demonstrationsof promising potential for multimodal medical dialogue, medical research andeducation. Taken together, our results offer compelling evidence forMed-Gemini's potential, although further rigorous evaluation will be crucialbefore real-world deployment in this safety-critical domain.

Capabilities of Gemini Models in Medicine | Latest Papers | HyperAI