2 months ago

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Mahir Labib Dihan, Md Tanvir Hassan, Md Tanvir Parvez, Md Hasebul Hasan, Md Almash Alam, Muhammad Aamir Cheema, Mohammed Eunus Ali, Md Rizwan Parvez

View Paper Details View Code

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation
Models

Abstract

Recent advancements in foundation models have enhanced AI systems'capabilities in autonomous tool usage and reasoning. However, their ability inlocation or map-based reasoning - which improves daily life by optimizingnavigation, facilitating resource discovery, and streamlining logistics - hasnot been systematically studied. To bridge this gap, we introduce MapEval, abenchmark designed to assess diverse and complex map-based user queries withgeo-spatial reasoning. MapEval features three task types (textual, API-based,and visual) that require collecting world information via map tools, processingheterogeneous geo-spatial contexts (e.g., named entities, travel distances,user reviews or ratings, images), and compositional reasoning, which allstate-of-the-art foundation models find challenging. Comprising 700 uniquemultiple-choice questions about locations across 180 cities and 54 countries,MapEval evaluates foundation models' ability to handle spatial relationships,map infographics, travel planning, and navigation challenges. Using MapEval, weconducted a comprehensive evaluation of 28 prominent foundation models. Whileno single model excelled across all tasks, Claude-3.5-Sonnet, GPT-4o, andGemini-1.5-Pro achieved competitive performance overall. However, substantialperformance gaps emerged, particularly in MapEval, where agents withClaude-3.5-Sonnet outperformed GPT-4o and Gemini-1.5-Pro by 16% and 21%,respectively, and the gaps became even more amplified when compared toopen-source LLMs. Our detailed analyses provide insights into the strengths andweaknesses of current models, though all models still fall short of humanperformance by more than 20% on average, struggling with complex map images andrigorous geo-spatial reasoning. This gap highlights MapEval's critical role inadvancing general-purpose foundation models with stronger geo-spatialunderstanding.