University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

We consider the problem of cross-view geo-localization. The primary challengeof this task is to learn the robust feature against large viewpoint changes.Existing benchmarks can help, but are limited in the number of viewpoints.Image pairs, containing two viewpoints, e.g., satellite and ground, are usuallyprovided, which may compromise the feature learning. Besides phone cameras andsatellites, in this paper, we argue that drones could serve as the thirdplatform to deal with the geo-localization problem. In contrast to thetraditional ground-view images, drone-view images meet fewer obstacles, e.g.,trees, and could provide a comprehensive view when flying around the targetplace. To verify the effectiveness of the drone platform, we introduce a newmulti-view multi-source benchmark for drone-based geo-localization, namedUniversity-1652. University-1652 contains data from three platforms, i.e.,synthetic drones, satellites and ground cameras of 1,652 university buildingsaround the world. To our knowledge, University-1652 is the first drone-basedgeo-localization dataset and enables two new tasks, i.e., drone-view targetlocalization and drone navigation. As the name implies, drone-view targetlocalization intends to predict the location of the target place via drone-viewimages. On the other hand, given a satellite-view query image, drone navigationis to drive the drone to the area of interest in the query. We use this datasetto analyze a variety of off-the-shelf CNN features and propose a strong CNNbaseline on this challenging dataset. The experiments show that University-1652helps the model to learn the viewpoint-invariant features and also has goodgeneralization ability in the real-world scenario.