BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation

Cross-view image matching for geo-localisation is a challenging problem dueto the significant visual difference between aerial and ground-levelviewpoints. The method provides localisation capabilities from geo-referencedimages, eliminating the need for external devices or costly equipment. Thisenhances the capacity of agents to autonomously determine their position,navigate, and operate effectively in GNSS-denied environments. Current researchemploys a variety of techniques to reduce the domain gap such as applying polartransforms to aerial images or synthesising between perspectives. However,these approaches generally rely on having a 360{\deg} field of view, limitingreal-world feasibility. We propose BEV-CV, an approach introducing two keynovelties with a focus on improving the real-world viability of cross-viewgeo-localisation. Firstly bringing ground-level images into a semanticBirds-Eye-View before matching embeddings, allowing for direct comparison withaerial image representations. Secondly, we adapt datasets into applicationrealistic format - limited Field-of-View images aligned to vehicle direction.BEV-CV achieves state-of-the-art recall accuracies, improving Top-1 rates of70{\deg} crops of CVUSA and CVACT by 23% and 24% respectively. Also decreasingcomputational requirements by reducing floating point operations to belowprevious works, and decreasing embedding dimensionality by 33% - togetherallowing for faster localisation capabilities.