Abstract:The perspective-$n$-point (P$n$P) problem is important for robotic pose estimation. It is well studied for optical cameras, but research is lacking for 2D forward-looking sonar (FLS) in underwater scenarios due to the vastly different imaging principles. In this paper, we demonstrate that, despite the nonlinearity inherent in sonar image formation, the P$n$P problem for 2D FLS can still be effectively addressed within a point-to-line (PtL) 3D registration paradigm through orthographic approximation. The registration is then resolved by a duality-based optimal solver, ensuring the global optimality. For coplanar cases, a null space analysis is conducted to retrieve the solutions from the dual formulation, enabling the methods to be applied to more general cases. Extensive simulations have been conducted to systematically evaluate the performance under different settings. Compared to non-reprojection-optimized state-of-the-art (SOTA) methods, the proposed approach achieves significantly higher precision. When both methods are optimized, ours demonstrates comparable or slightly superior precision.
Abstract:Rejecting outliers before applying classical robust methods is a common approach to increase the success rate of estimation, particularly when the outlier ratio is extremely high (e.g. 90%). However, this method often relies on sensor- or task-specific characteristics, which may not be easily transferable across different scenarios. In this paper, we focus on the problem of rejecting 2D-3D point correspondence outliers from 2D forward-looking sonar (2D FLS) observations, which is one of the most popular perception device in the underwater field but has a significantly different imaging mechanism compared to widely used perspective cameras and LiDAR. We fully leverage the narrow field of view in the elevation of 2D FLS and develop two compatibility tests for different 3D point configurations: (1) In general cases, we design a pairwise length in-range test to filter out overly long or short edges formed from point sets; (2) In coplanar cases, we design a coplanarity test to check if any four correspondences are compatible under a coplanar setting. Both tests are integrated into outlier rejection pipelines, where they are followed by maximum clique searching to identify the largest consistent measurement set as inliers. Extensive simulations demonstrate that the proposed methods for general and coplanar cases perform effectively under outlier ratios of 80% and 90%, respectively.
Abstract:Channel estimation is a fundamental task in communication systems and is critical for effective demodulation. While most works deal with a simple scenario where the measurements are corrupted by the additive white Gaussian noise (AWGN), this work addresses the more challenging scenario where both AWGN and structured interference coexist. Such conditions arise, for example, when a sonar/radar transmitter and a communication receiver operate simultaneously within the same bandwidth. To ensure accurate channel estimation in these scenarios, the sparsity of the channel in the delay domain and the complicate structure of the interference are jointly exploited. Firstly, the score of the structured interference is learned via a neural network based on the diffusion model (DM), while the channel prior is modeled as a Gaussian distribution, with its variance controlling channel sparsity, similar to the setup of the sparse Bayesian learning (SBL). Then, two efficient posterior sampling methods are proposed to jointly estimate the sparse channel and the interference. Nuisance parameters, such as the variance of the prior are estimated via the expectation maximization (EM) algorithm. The proposed method is termed as DM based SBL (DM-SBL). Numerical simulations demonstrate that DM-SBL significantly outperforms conventional approaches that deal with the AWGN scenario, particularly under low signal-to-interference ratio (SIR) conditions. Beyond channel estimation, DM-SBL also shows promise for addressing other linear inverse problems involving structured interference.
Abstract:The emergence of 3D Gaussian Splatting (3DGS) has recently sparked a renewed wave of dense visual SLAM research. However, current methods face challenges such as sensitivity to artifacts and noise, sub-optimal selection of training viewpoints, and a lack of light global optimization. In this paper, we propose a dense SLAM system that tightly couples 3DGS with ORB features. We design a joint optimization approach for robust tracking and effectively reducing the impact of noise and artifacts. This involves combining novel geometric observations, derived from accumulated transmittance, with ORB features extracted from pixel data. Furthermore, to improve mapping quality, we propose an adaptive Gaussian expansion and regularization method that enables Gaussian primitives to represent the scene compactly. This is coupled with a viewpoint selection strategy based on the hybrid graph to mitigate over-fitting effects and enhance convergence quality. Finally, our approach achieves compact and high-quality scene representations and accurate localization. GSORB-SLAM has been evaluated on different datasets, demonstrating outstanding performance. The code will be available.
Abstract:We present InstantStyleGaussian, an innovative 3D style transfer method based on the 3D Gaussian Splatting (3DGS) scene representation. By inputting a target style image, it quickly generates new 3D GS scenes. Our approach operates on pre-reconstructed GS scenes, combining diffusion models with an improved iterative dataset update strategy. It utilizes diffusion models to generate target style images, adds these new images to the training dataset, and uses this dataset to iteratively update and optimize the GS scenes. Extensive experimental results demonstrate that our method ensures high-quality stylized scenes while offering significant advantages in style transfer speed and consistency.
Abstract:Block transmission systems have been proven successful over frequency-selective channels. For time-varying channel such as in high-speed mobile communication and underwater communication, existing equalizers assume that channels over different data frames are independent. However, the real-world channels over different data frames are correlated, thereby indicating potentials for performance improvement. In this paper, we propose a joint channel estimation and equalization/decoding algorithm for a single-carrier system that exploits temporal correlations of channel between transmitted data frames. Leveraging the concept of dynamic compressive sensing, our method can utilize the information of several data frames to achieve better performance. The information not only passes between the channel and symbol, but also the channels over different data frames. Numerical simulations using an extensively validated underwater acoustic model with a time-varying channel establish that the proposed algorithm outperforms the former bilinear generalized approximate message passing equalizer and classic minimum mean square error turbo equalizer in bit error rate and channel estimation normalized mean square error. The algorithm idea we present can also find applications in other bilinear multiple measurements vector compressive sensing problems.
Abstract:Forward-Looking Sonar (FLS) has started to gain attention in the field of near-bottom close-range underwater inspection because of its high resolution and high framerate features. Although Automatic Target Recognition (ATR) algorithms have been applied tentatively for object-searching tasks, human supervision is still indispensable, especially when involving critical areas. A clear FLS mosaic containing all suspicious information is in demand to help experts deal with tremendous perception data. However, previous work only considered that FLS is working in an ideal system configuration, which assumes an appropriate sonar imaging setup and the availability of accurate positioning data. Without those promises, the intra-frame and inter-frame artifacts will appear and degrade the quality of the final mosaic by making the information of interest invisible. In this paper, we propose a novel blending method for FLS mosaicing which can preserve interested information. A Long-Short Time Sliding Window (LST-SW) is designed to rectify the local statistics of raw sonar images. The statistics are then utilized to construct a Global Variance Map (GVM). The GVM helps to emphasize the useful information contained in images in the blending phase by classifying the informative and featureless pixels, thereby enhancing the quality of final mosaic. The method is verified using data collected in the real environment. The results show that our method can preserve more details in FLS mosaics for human inspection purposes in practice.
Abstract:In the use of database systems, the design of the storage engine and data model directly affects the performance of the database when performing queries. Therefore, the users of the database need to select the storage engine and design data model according to the workload encountered. However, in a hybrid workload, the query set of the database is dynamically changing, and the design of its optimal storage structure is also changing. Motivated by this, we propose an automatic storage structure selection system based on learning cost, which is used to dynamically select the optimal storage structure of the database under hybrid workloads. In the system, we introduce a machine learning method to build a cost model for the storage engine, and a column-oriented data layout generation algorithm. Experimental results show that the proposed system can choose the optimal combination of storage engine and data model according to the current workload, which greatly improves the performance of the default storage structure. And the system is designed to be compatible with different storage engines for easy use in practical applications.
Abstract:A high redundant non-holonomic humanoid mobile dual-arm manipulator system is presented in this paper where the motion planning to realize "human-like" autonomous navigation and manipulation tasks is studied. Firstly, an improved MaxiMin NSGA-II algorithm, which optimizes five objective functions to solve the problems of singularity, redundancy, and coupling between mobile base and manipulator simultaneously, is proposed to design the optimal pose to manipulate the target object. Then, in order to link the initial pose and that optimal pose, an off-line motion planning algorithm is designed. In detail, an efficient direct-connect bidirectional RRT and gradient descent algorithm is proposed to reduce the sampled nodes largely, and a geometric optimization method is proposed for path pruning. Besides, head forward behaviors are realized by calculating the reasonable orientations and assigning them to the mobile base to improve the quality of human-robot interaction. Thirdly, the extension to on-line planning is done by introducing real-time sensing, collision-test and control cycles to update robotic motion in dynamic environments. Fourthly, an EEs' via-point-based multi-objective genetic algorithm is proposed to design the "human-like" via-poses by optimizing four objective functions. Finally, numerous simulations are presented to validate the effectiveness of proposed algorithms.