School of Geodesy and Geomatics, Wuhan University, China
Abstract:True Digital Orthophoto Maps (TDOMs) are essential products for digital twins and Geographic Information Systems (GIS). Traditionally, TDOM generation involves a complex set of traditional photogrammetric process, which may deteriorate due to various challenges, including inaccurate Digital Surface Model (DSM), degenerated occlusion detections, and visual artifacts in weak texture regions and reflective surfaces, etc. To address these challenges, we introduce TOrtho-Gaussian, a novel method inspired by 3D Gaussian Splatting (3DGS) that generates TDOMs through orthogonal splatting of optimized anisotropic Gaussian kernel. More specifically, we first simplify the orthophoto generation by orthographically splatting the Gaussian kernels onto 2D image planes, formulating a geometrically elegant solution that avoids the need for explicit DSM and occlusion detection. Second, to produce TDOM of large-scale area, a divide-and-conquer strategy is adopted to optimize memory usage and time efficiency of training and rendering for 3DGS. Lastly, we design a fully anisotropic Gaussian kernel that adapts to the varying characteristics of different regions, particularly improving the rendering quality of reflective surfaces and slender structures. Extensive experimental evaluations demonstrate that our method outperforms existing commercial software in several aspects, including the accuracy of building boundaries, the visual quality of low-texture regions and building facades. These results underscore the potential of our approach for large-scale urban scene reconstruction, offering a robust alternative for enhancing TDOM quality and scalability.
Abstract:Over the last few decades, image-based building surface reconstruction has garnered substantial research interest and has been applied across various fields, such as heritage preservation, architectural planning, etc. Compared to the traditional photogrammetric and NeRF-based solutions, recently, Gaussian fields-based methods have exhibited significant potential in generating surface meshes due to their time-efficient training and detailed 3D information preservation. However, most gaussian fields-based methods are trained with all image pixels, encompassing building and nonbuilding areas, which results in a significant noise for building meshes and degeneration in time efficiency. This paper proposes a novel framework, Masked Gaussian Fields (MGFs), designed to generate accurate surface reconstruction for building in a time-efficient way. The framework first applies EfficientSAM and COLMAP to generate multi-level masks of building and the corresponding masked point clouds. Subsequently, the masked gaussian fields are trained by integrating two innovative losses: a multi-level perceptual masked loss focused on constructing building regions and a boundary loss aimed at enhancing the details of the boundaries between different masks. Finally, we improve the tetrahedral surface mesh extraction method based on the masked gaussian spheres. Comprehensive experiments on UAV images demonstrate that, compared to the traditional method and several NeRF-based and Gaussian-based SOTA solutions, our approach significantly improves both the accuracy and efficiency of building surface reconstruction. Notably, as a byproduct, there is an additional gain in the novel view synthesis of building.
Abstract:Over the last decades, ample achievements have been made on Structure from motion (SfM). However, the vast majority of them basically work in an offline manner, i.e., images are firstly captured and then fed together into a SfM pipeline for obtaining poses and sparse point cloud. In this work, on the contrary, we present an on-the-fly SfM: running online SfM while image capturing, the newly taken On-the-Fly image is online estimated with the corresponding pose and points, i.e., what you capture is what you get. Specifically, our approach firstly employs a vocabulary tree that is unsupervised trained using learning-based global features for fast image retrieval of newly fly-in image. Then, a robust feature matching mechanism with least squares (LSM) is presented to improve image registration performance. Finally, via investigating the influence of newly fly-in image's connected neighboring images, an efficient hierarchical weighted local bundle adjustment (BA) is used for optimization. Extensive experimental results demonstrate that on-the-fly SfM can meet the goal of robustly registering the images while capturing in an online way.
Abstract:Many visual simultaneous localization and mapping (SLAM) systems have been shown to be accurate and robust, and have real-time performance capabilities on both indoor and ground datasets. However, these methods can be problematic when dealing with aerial frames captured by a camera mounted on an unmanned aerial vehicle (UAV) because the flight height of the UAV can be difficult to control and is easily affected by the environment.To cope with the case of lost tracking, many visual SLAM systems employ a relocalization strategy. This involves the tracking thread continuing the online working by inspecting the connections between the subsequent new frames and the generated map before the tracking was lost. To solve the missing map problem, which is an issue in many applications , after the tracking is lost, based on monocular visual SLAM, we present a method of reconstructing a complete global map of UAV datasets by sequentially merging the submaps via the corresponding undirected connected graph. Specifically, submaps are repeatedly generated, from the initialization process to the place where the tracking is lost, and a corresponding undirected connected graph is built by considering these submaps as nodes and the common map points within two submaps as edges. The common map points are then determined by the bag-of-words (BoW) method, and the submaps are merged if they are found to be connected with the online map in the undirect connected graph. To demonstrate the performance of the proposed method, we first investigated the performance on a UAV dataset, and the experimental results showed that, in the case of several tracking failures, the integrity of the mapping was significantly better than that of the current mainstream SLAM method.
Abstract:In recent years, building change detection methods have made great progress by introducing deep learning, but they still suffer from the problem of the extracted features not being discriminative enough, resulting in incomplete regions and irregular boundaries. To tackle this problem, we propose a dual task constrained deep Siamese convolutional network (DTCDSCN) model, which contains three sub-networks: a change detection network and two semantic segmentation networks. DTCDSCN can accomplish both change detection and semantic segmentation at the same time, which can help to learn more discriminative object-level features and obtain a complete change detection map. Furthermore, we introduce a dual attention module (DAM) to exploit the interdependencies between channels and spatial positions, which improves the feature representation. We also improve the focal loss function to suppress the sample imbalance problem. The experimental results obtained with the WHU building dataset show that the proposed method is effective for building change detection and achieves a state-of-the-art performance in terms of four metrics: precision, recall, F1-score, and intersection over union.