Abstract:Exploring the capabilities of Neural Radiance Fields (NeRF) and Gaussian-based methods in the context of 3D scene reconstruction, this study contrasts these modern approaches with traditional Simultaneous Localization and Mapping (SLAM) systems. Utilizing datasets such as Replica and ScanNet, we assess performance based on tracking accuracy, mapping fidelity, and view synthesis. Findings reveal that NeRF excels in view synthesis, offering unique capabilities in generating new perspectives from existing data, albeit at slower processing speeds. Conversely, Gaussian-based methods provide rapid processing and significant expressiveness but lack comprehensive scene completion. Enhanced by global optimization and loop closure techniques, newer methods like NICE-SLAM and SplaTAM not only surpass older frameworks such as ORB-SLAM2 in terms of robustness but also demonstrate superior performance in dynamic and complex environments. This comparative analysis bridges theoretical research with practical implications, shedding light on future developments in robust 3D scene reconstruction across various real-world applications.
Abstract:Deep neural networks (DNNs) have achieved exceptional performance across various fields by learning complex nonlinear mappings from large-scale datasets. However, they encounter challenges such as high computational costs and limited interpretability. To address these issues, hybrid approaches that integrate physics with AI are gaining interest. This paper introduces a novel physics-based AI model called the "Nonlinear Schr\"odinger Network", which treats the Nonlinear Schr\"odinger Equation (NLSE) as a general-purpose trainable model for learning complex patterns including nonlinear mappings and memory effects from data. Existing physics-informed machine learning methods use neural networks to approximate the solutions of partial differential equations (PDEs). In contrast, our approach directly treats the PDE as a trainable model to obtain general nonlinear mappings that would otherwise require neural networks. As a physics-inspired approach, it offers a more interpretable and parameter-efficient alternative to traditional black-box neural networks, achieving comparable or better accuracy in time series classification tasks while significantly reducing the number of required parameters. Notably, the trained Nonlinear Schr\"odinger Network is interpretable, with all parameters having physical meanings as properties of a virtual physical system that transforms the data to a more separable space. This interpretability allows for insight into the underlying dynamics of the data transformation process. Applications to time series forecasting have also been explored. While our current implementation utilizes the NLSE, the proposed method of using physics equations as trainable models to learn nonlinear mappings from data is not limited to the NLSE and may be extended to other master equations of physics.
Abstract:Gaussian SLAM systems have made significant advancements in improving the efficiency and fidelity of real-time reconstructions. However, these systems often encounter incomplete reconstructions in complex indoor environments, characterized by substantial holes due to unobserved geometry caused by obstacles or limited view angles. To address this challenge, we present Manhattan Gaussian SLAM (MG-SLAM), an RGB-D system that leverages the Manhattan World hypothesis to enhance geometric accuracy and completeness. By seamlessly integrating fused line segments derived from structured scenes, MG-SLAM ensures robust tracking in textureless indoor areas. Moreover, The extracted lines and planar surface assumption allow strategic interpolation of new Gaussians in regions of missing geometry, enabling efficient scene completion. Extensive experiments conducted on both synthetic and real-world scenes demonstrate that these advancements enable our method to achieve state-of-the-art performance, marking a substantial improvement in the capabilities of Gaussian SLAM systems.
Abstract:Generative Adversarial Networks (GANs) have significantly advanced image processing, with Pix2Pix being a notable framework for image-to-image translation. This paper explores a novel application of Pix2Pix to transform abstract map images into realistic ground truth images, addressing the scarcity of such images crucial for domains like urban planning and autonomous vehicle training. We detail the Pix2Pix model's utilization for generating high-fidelity datasets, supported by a dataset of paired map and aerial images, and enhanced by a tailored training regimen. The results demonstrate the model's capability to accurately render complex urban features, establishing its efficacy and potential for broad real-world applications.
Abstract:The digital transformation is fundamentally changing our industries, affecting planning, execution as well as monitoring of production processes in a wide range of application fields. With product line-ups becoming more and more versatile and diverse, the necessary inspection and monitoring sparks significant novel requirements on the corresponding Nondestructive Evaluation (NDE) systems. The establishment of increasingly powerful approaches to incorporate Artificial Intelligence (AI) may provide just the needed innovation to solve some of these challenges. In this paper we provide a comprehensive survey about the usage of AI methods in NDE in light of the recent innovations towards NDE 4.0. Since we cannot discuss each NDE modality in one paper, we limit our attention to magnetic methods, ultrasound, thermography, as well as optical inspection. In addition to reviewing recent AI developments in each field, we draw common connections by pointing out NDE-related tasks that have a common underlying mathematical problem and categorizing the state of the art according to the corresponding sub-tasks. In so doing, interdisciplinary connections are drawn that provide a more complete overall picture.
Abstract:Strategic subsampling has become a focal point due to its effectiveness in compressing data, particularly in the Full Matrix Capture (FMC) approach in ultrasonic imaging. This paper introduces the Joint Deep Probabilistic Subsampling (J-DPS) method, which aims to learn optimal selection matrices simultaneously for transmitters, receivers, and Fourier coefficients. This task-based algorithm is realized by introducing a specialized measurement model and integrating a customized Complex Learned FISTA (CL-FISTA) network. We propose a parallel network architecture, partitioned into three segments corresponding to the three matrices, all working toward a shared optimization objective with adjustable loss allocation. A synthetic dataset is designed to reflect practical scenarios, and we provide quantitative comparisons with a traditional CRB-based algorithm, standard DPS, and J-DPS.
Abstract:We consider minimizing a function consisting of a quadratic term and a proximable term which is possibly nonconvex and nonsmooth. This problem is also known as scaled proximal operator. Despite its simple form, existing methods suffer from slow convergence or high implementation complexity or both. To overcome these limitations, we develop a fast and user-friendly second-order proximal algorithm. Key innovation involves building and solving a series of opportunistically majorized problems along a hybrid Newton direction. The approach directly uses the precise Hessian of the quadratic term, and calculates the inverse only once, eliminating the iterative numerical approximation of the Hessian, a common practice in quasi-Newton methods. The algorithm's convergence to a critical point is established, and local convergence rate is derived based on the Kurdyka-Lojasiewicz property of the objective function. Numerical comparisons are conducted on well-known optimization problems. The results demonstrate that the proposed algorithm not only achieves a faster convergence but also tends to converge to a better local optimum compare to benchmark algorithms.
Abstract:Gradient compression with error compensation has attracted significant attention with the target of reducing the heavy communication overhead in distributed learning. However, existing compression methods either perform only unidirectional compression in one iteration with higher communication cost, or bidirectional compression with slower convergence rate. In this work, we propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm to break the above bottlenecks based on bidirectional compression and carefully designed compensation approaches. Specifically, the bidirectional compression technique is to reduce the communication cost, and the compensation technique compensates the local compression error to the model update immediately while only maintaining the global error variable on the server throughout the iterations to boost its efficacy. Theoretically, we prove that LIEC-SGD is superior to previous works in either the convergence rate or the communication cost, which indicates that LIEC-SGD could inherit the dual advantages from unidirectional compression and bidirectional compression. Finally, experiments of training deep neural networks validate the effectiveness of the proposed LIEC-SGD algorithm.
Abstract:Realtime high-throughput sensing and detection enables the capture of rare events within sub-picosecond time scale, which makes it possible for scientists to uncover the mystery of ultrafast physical processes. Photonic time stretch is one of the most successful approaches that utilize the ultra-wide bandwidth of mode-locked laser for detecting ultrafast signal. Though powerful, it relies on supercontinuum mode-locked laser source, which is expensive and difficult to integrate. This greatly limits the application of this technology. Here we propose a novel Continuous Wave (CW) implementation of the photonic time stretch. Instead of a supercontinuum mode-locked laser, a wavelength division multiplexed (WDM) CW laser, pulsed by electro-optic (EO) modulation, is adopted as the laser source. This opens up the possibility for low-cost integrated time stretch systems. This new approach is validated via both simulation and experiment. Two scenarios for potential application are also described.
Abstract:As a harzard disaster, landslide often brings tremendous losses to humanity, so it's necessary to achieve reliable detection of landslide. However, the problems of visual blur and small-sized dataset cause great challenges for old landslide detection task when using remote sensing data. To reliably extract semantic features, a hyper-pixel-wise contrastive learning augmented segmentation network (HPCL-Net) is proposed, which augments the local salient feature extraction from the boundaries of landslides through HPCL and fuses the heterogeneous infromation in the semantic space from High-Resolution Remote Sensing Images and Digital Elevation Model Data data. For full utilization of the precious samples, a global hyper-pixel-wise sample pair queues-based contrastive learning method, which includes the construction of global queues that store hyper-pixel-wise samples and the updating scheme of a momentum encoder, is developed, reliably enhancing the extraction ability of semantic features. The proposed HPCL-Net is evaluated on a Loess Plateau old landslide dataset and experiment results show that the model greatly improves the reliablity of old landslide detection compared to the previous old landslide segmentation model, where mIoU metric is increased from 0.620 to 0.651, Landslide IoU metric is increased from 0.334 to 0.394 and F1-score metric is increased from 0.501 to 0.565.