Abstract:Extracting building footprints from aerial images is essential for precise urban mapping with photogrammetric computer vision technologies. Existing approaches mainly assume that the roof and footprint of a building are well overlapped, which may not hold in off-nadir aerial images as there is often a big offset between them. In this paper, we propose an offset vector learning scheme, which turns the building footprint extraction problem in off-nadir images into an instance-level joint prediction problem of the building roof and its corresponding "roof to footprint" offset vector. Thus the footprint can be estimated by translating the predicted roof mask according to the predicted offset vector. We further propose a simple but effective feature-level offset augmentation module, which can significantly refine the offset vector prediction by introducing little extra cost. Moreover, a new dataset, Buildings in Off-Nadir Aerial Images (BONAI), is created and released in this paper. It contains 268,958 building instances across 3,300 aerial images with fully annotated instance-level roof, footprint, and corresponding offset vector for each building. Experiments on the BONAI dataset demonstrate that our method achieves the state-of-the-art, outperforming other competitors by 3.37 to 7.39 points in F1-score. The codes, datasets, and trained models are available at https://github.com/jwwangchn/BONAI.git.