Abstract:In the domain of the U.S. Army modeling and simulation, the availability of high quality annotated 3D data is pivotal to creating virtual environments for training and simulations. Traditional methodologies for 3D semantic and instance segmentation, such as KpConv, RandLA, Mask3D, etc., are designed to train on extensive labeled datasets to obtain satisfactory performance in practical tasks. This requirement presents a significant challenge, given the inherent scarcity of manually annotated 3D datasets, particularly for the military use cases. Recognizing this gap, our previous research leverages the One World Terrain data repository manually annotated databases, as showcased at IITSEC 2019 and 2021, to enrich the training dataset for deep learning models. However, collecting and annotating large scale 3D data for specific tasks remains costly and inefficient. To this end, the objective of this research is to design and develop a comprehensive and efficient framework for 3D segmentation tasks to assist in 3D data annotation. This framework integrates Grounding DINO and Segment anything Model, augmented by an enhancement in 2D image rendering via 3D mesh. Furthermore, the authors have also developed a user friendly interface that facilitates the 3D annotation process, offering intuitive visualization of rendered images and the 3D point cloud.
Abstract:3D Gaussian Splatting (3DGS) has recently advanced radiance field reconstruction by offering superior capabilities for novel view synthesis and real-time rendering speed. However, its strategy of blending optimization and adaptive density control might lead to sub-optimal results; it can sometimes yield noisy geometry and blurry artifacts due to prioritizing optimizing large Gaussians at the cost of adequately densifying smaller ones. To address this, we introduce AtomGS, consisting of Atomized Proliferation and Geometry-Guided Optimization. The Atomized Proliferation constrains ellipsoid Gaussians of various sizes into more uniform-sized Atom Gaussians. The strategy enhances the representation of areas with fine features by placing greater emphasis on densification in accordance with scene details. In addition, we proposed a Geometry-Guided Optimization approach that incorporates an Edge-Aware Normal Loss. This optimization method effectively smooths flat surfaces while preserving intricate details. Our evaluation shows that AtomGS outperforms existing state-of-the-art methods in rendering quality. Additionally, it achieves competitive accuracy in geometry reconstruction and offers a significant improvement in training speed over other SDF-based methods. More interactive demos can be found in our website (https://rongliu-leo.github.io/AtomGS/).
Abstract:The area of Video Camouflaged Object Detection (VCOD) presents unique challenges in the field of computer vision due to texture similarities between target objects and their surroundings, as well as irregular motion patterns caused by both objects and camera movement. In this paper, we introduce TokenMotion (TMNet), which employs a transformer-based model to enhance VCOD by extracting motion-guided features using a learnable token selection. Evaluated on the challenging MoCA-Mask dataset, TMNet achieves state-of-the-art performance in VCOD. It outperforms the existing state-of-the-art method by a 12.8% improvement in weighted F-measure, an 8.4% enhancement in S-measure, and a 10.7% boost in mean IoU. The results demonstrate the benefits of utilizing motion-guided features via learnable token selection within a transformer-based framework to tackle the intricate task of VCOD.
Abstract:In this work, we target the problem of uncertain points refinement for image-based LiDAR point cloud semantic segmentation (LiDAR PCSS). This problem mainly results from the boundary-blurring problem of convolution neural networks (CNNs) and quantitation loss of spherical projection, which are often hard to avoid for common image-based LiDAR PCSS approaches. We propose a plug-and-play transformer-based uncertain point refiner (TransUPR) to address the problem. Through local feature aggregation, uncertain point localization, and self-attention-based transformer design, TransUPR, integrated into an existing range image-based LiDAR PCSS approach (e.g., CENet), achieves the state-of-the-art performance (68.2% mIoU) on Semantic-KITTI benchmark, which provides a performance improvement of 0.6% on the mIoU.
Abstract:Although various 3D datasets with different functions and scales have been proposed recently, it remains challenging for individuals to complete the whole pipeline of large-scale data collection, sanitization, and annotation. Moreover, the created datasets usually suffer from extremely imbalanced class distribution or partial low-quality data samples. Motivated by this, we explore the procedurally synthetic 3D data generation paradigm to equip individuals with the full capability of creating large-scale annotated photogrammetry point clouds. Specifically, we introduce a synthetic aerial photogrammetry point clouds generation pipeline that takes full advantage of open geospatial data sources and off-the-shelf commercial packages. Unlike generating synthetic data in virtual games, where the simulated data usually have limited gaming environments created by artists, the proposed pipeline simulates the reconstruction process of the real environment by following the same UAV flight pattern on different synthetic terrain shapes and building densities, which ensure similar quality, noise pattern, and diversity with real data. In addition, the precise semantic and instance annotations can be generated fully automatically, avoiding the expensive and time-consuming manual annotation. Based on the proposed pipeline, we present a richly-annotated synthetic 3D aerial photogrammetry point cloud dataset, termed STPLS3D, with more than 16 $km^2$ of landscapes and up to 18 fine-grained semantic categories. For verification purposes, we also provide a parallel dataset collected from four areas in the real environment. Extensive experiments conducted on our datasets demonstrate the effectiveness and quality of the proposed synthetic dataset.
Abstract:In recent years, photogrammetry has been widely used in many areas to create photorealistic 3D virtual data representing the physical environment. The innovation of small unmanned aerial vehicles (sUAVs) has provided additional high-resolution imaging capabilities with low cost for mapping a relatively large area of interest. These cutting-edge technologies have caught the US Army and Navy's attention for the purpose of rapid 3D battlefield reconstruction, virtual training, and simulations. Our previous works have demonstrated the importance of information extraction from the derived photogrammetric data to create semantic-rich virtual environments (Chen et al., 2019). For example, an increase of simulation realism and fidelity was achieved by segmenting and replacing photogrammetric trees with game-ready tree models. In this work, we further investigated the semantic information extraction problem and focused on the ground material segmentation and object detection tasks. The main innovation of this work was that we leveraged both the original 2D images and the derived 3D photogrammetric data to overcome the challenges faced when using each individual data source. For ground material segmentation, we utilized an existing convolutional neural network architecture (i.e., 3DMV) which was originally designed for segmenting RGB-D sensed indoor data. We improved its performance for outdoor photogrammetric data by introducing a depth pooling layer in the architecture to take into consideration the distance between the source images and the reconstructed terrain model. To test the performance of our improved 3DMV, a ground truth ground material database was created using data from the One World Terrain (OWT) data repository. Finally, a workflow for importing the segmented ground materials into a virtual simulation scene was introduced, and visual results are reported in this paper.
Abstract:In the globalized economic world, it has become important to understand the purpose behind infrastructural and construction initiatives occurring within developing regions of the earth. This is critical when the financing for such projects must be coming from external sources, as is occurring throughout major portions of the African continent. When it comes to imagery analysis to research these regions, ground and aerial coverage is either non-existent or not commonly acquired. However, imagery from a large number of commercial, private, and government satellites have produced enormous datasets with global coverage, compiling geospatial resources that can be mined and processed using machine learning algorithms and neural networks. The downside is that a majority of these geospatial data resources are in a state of technical stasis, as it is difficult to quickly parse and determine a plan for request and processing when acquiring satellite image data. A goal of this research is to allow automated monitoring for largescale infrastructure projects, such as railways, to determine reliable metrics that define and predict the direction construction initiatives could take, allowing for a directed monitoring via narrowed and targeted satellite imagery requests. By utilizing photogrammetric techniques on available satellite data to create 3D Meshes and Digital Surface Models (DSM) we hope to effectively predict transport routes. In understanding the potential directions that largescale transport infrastructure will take through predictive modeling, it becomes much easier to track, understand, and monitor progress, especially in areas with limited imagery coverage.
Abstract:With state-of-the-art sensing and photogrammetric techniques, Microsoft Bing Maps team has created over 125 highly detailed 3D cities from 11 different countries that cover hundreds of thousands of square kilometer areas. The 3D city models were created using the photogrammetric technique with high-resolution images that were captured from aircraft-mounted cameras. Such a large 3D city database has caught the attention of the US Army for creating virtual simulation environments to support military operations. However, the 3D city models do not have semantic information such as buildings, vegetation, and ground and cannot allow sophisticated user-level and system-level interaction. At I/ITSEC 2019, the authors presented a fully automated data segmentation and object information extraction framework for creating simulation terrain using UAV-based photogrammetric data. This paper discusses the next steps in extending our designed data segmentation framework for segmenting 3D city data. In this study, the authors first investigated the strengths and limitations of the existing framework when applied to the Bing data. The main differences between UAV-based and aircraft-based photogrammetric data are highlighted. The data quality issues in the aircraft-based photogrammetric data, which can negatively affect the segmentation performance, are identified. Based on the findings, a workflow was designed specifically for segmenting Bing data while considering its characteristics. In addition, since the ultimate goal is to combine the use of both small unmanned aerial vehicle (UAV) collected data and the Bing data in a virtual simulation environment, data from these two sources needed to be aligned and registered together. To this end, the authors also proposed a data registration workflow that utilized the traditional iterative closest point (ICP) with the extracted semantic information.
Abstract:At I/ITSEC 2019, the authors presented a fully-automated workflow to segment 3D photogrammetric point-clouds/meshes and extract object information, including individual tree locations and ground materials (Chen et al., 2019). The ultimate goal is to create realistic virtual environments and provide the necessary information for simulation. We tested the generalizability of the previously proposed framework using a database created under the U.S. Army's One World Terrain (OWT) project with a variety of landscapes (i.e., various buildings styles, types of vegetation, and urban density) and different data qualities (i.e., flight altitudes and overlap between images). Although the database is considerably larger than existing databases, it remains unknown whether deep-learning algorithms have truly achieved their full potential in terms of accuracy, as sizable data sets for training and validation are currently lacking. Obtaining large annotated 3D point-cloud databases is time-consuming and labor-intensive, not only from a data annotation perspective in which the data must be manually labeled by well-trained personnel, but also from a raw data collection and processing perspective. Furthermore, it is generally difficult for segmentation models to differentiate objects, such as buildings and tree masses, and these types of scenarios do not always exist in the collected data set. Thus, the objective of this study is to investigate using synthetic photogrammetric data to substitute real-world data in training deep-learning algorithms. We have investigated methods for generating synthetic UAV-based photogrammetric data to provide a sufficiently sized database for training a deep-learning algorithm with the ability to enlarge the data size for scenarios in which deep-learning models have difficulties.
Abstract:Our previous works have demonstrated that visually realistic 3D meshes can be automatically reconstructed with low-cost, off-the-shelf unmanned aerial systems (UAS) equipped with capable cameras, and efficient photogrammetric software techniques. However, such generated data do not contain semantic information/features of objects (i.e., man-made objects, vegetation, ground, object materials, etc.) and cannot allow the sophisticated user-level and system-level interaction. Considering the use case of the data in creating realistic virtual environments for training and simulations (i.e., mission planning, rehearsal, threat detection, etc.), segmenting the data and extracting object information are essential tasks. Thus, the objective of this research is to design and develop a fully automated photogrammetric data segmentation and object information extraction framework. To validate the proposed framework, the segmented data and extracted features were used to create virtual environments in the authors previously designed simulation tool i.e., Aerial Terrain Line of Sight Analysis System (ATLAS). The results showed that 3D mesh trees could be replaced with geo-typical 3D tree models using the extracted individual tree locations. The extracted tree features (i.e., color, width, height) are valuable for selecting the appropriate tree species and enhance visual quality. Furthermore, the identified ground material information can be taken into consideration for pathfinding. The shortest path can be computed not only considering the physical distance, but also considering the off-road vehicle performance capabilities on different ground surface materials.