AAAI2025

Semantic Ambiguity Modeling and Propagation for Fine-Grained Visual Cross View Geo-Localization

Mingtao Feng, Fenghao Tian, Jianqiao Luo, Zijie Wu, Weisheng Dong, Yaonan Wang, Ajmal Saeed Mian

被引用 4 次

摘要

In this paper, we introduce a novel approach to fine-grained cross-view geolocalization. Our method aligns a warped ground image with a corresponding GPS-tagged satellite image covering the same area using homography estimation. We first employ a differentiable spherical transform, adhering to geometric principles, to accurately align the perspective of the ground image with the satellite map. This transformation effectively places ground and aerial images in the same view and on the same plane, reducing the task to an image alignment problem. To address challenges such as occlusion, small overlapping range, and seasonal variations, we propose a robust correlation-aware homography estimator to align similar parts of the transformed ground image with the satellite image. Our method achieves sub-pixel resolution and meter-level GPS accuracy by mapping the center point of the transformed ground image to the satellite image using a homography matrix and determining the orientation of the ground camera using a point above the central axis. Operating at a speed of 30 FPS, our method outperforms stateof-the-art techniques, reducing the mean metric localization error by 21.3% and 32.4% in same-area and cross-area generalization tasks on the VIGOR benchmark, respectively, and by 34.4% on the KITTI benchmark in same-area evaluation. Recently, there has been a growing interest in fine-grained cross-view geo-localization, which assumes the availability of the ground image and a corresponding GPS-labeled satellite image patch covering the same area. Existing methods can be divided into two categories: those based on repeated sampling [27, 24, 7, 26] and those employing descriptor candidates splitting from satellite image features