Beyond Geo-localization: Fine-grained Orientation of Street-view Images by Cross-view Matching with Satellite Imagery with Supplementary Materials

Street-view imagery provides us with novel experiences to explore differentplaces remotely. Carefully calibrated street-view images (e.g. Google StreetView) can be used for different downstream tasks, e.g. navigation, map featuresextraction. As personal high-quality cameras have become much more affordableand portable, an enormous amount of crowdsourced street-view images areuploaded to the internet, but commonly with missing or noisy sensorinformation. To prepare this hidden treasure for "ready-to-use" status,determining missing location information and camera orientation angles are twoequally important tasks. Recent methods have achieved high performance ongeo-localization of street-view images by cross-view matching with a pool ofgeo-referenced satellite imagery. However, most of the existing works focusmore on geo-localization than estimating the image orientation. In this work,we re-state the importance of finding fine-grained orientation for street-viewimages, formally define the problem and provide a set of evaluation metrics toassess the quality of the orientation estimation. We propose two methods toimprove the granularity of the orientation estimation, achieving 82.4% and72.3% accuracy for images with estimated angle errors below 2 degrees for CVUSAand CVACT datasets, corresponding to 34.9% and 28.2% absolute improvementcompared to previous works. Integrating fine-grained orientation estimation intraining also improves the performance on geo-localization, giving top 1 recall95.5%/85.5% and 86.8%/80.4% for orientation known/unknown tests on the twodatasets.