Abstract:Establishing reliable correspondences is essential for registration tasks such as 3D and 2D3D registration. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios. To mitigate these challenges, we introduce a diffusion matching model for robust correspondence construction. Our approach treats correspondence estimation as a denoising diffusion process within the doubly stochastic matrix space, which gradually denoises (refines) a doubly stochastic matching matrix to the ground-truth one for high-quality correspondence estimation. It involves a forward diffusion process that gradually introduces Gaussian noise into the ground truth matching matrix and a reverse denoising process that iteratively refines the noisy matching matrix. In particular, the feature extraction from the backbone occurs only once during the inference phase. Our lightweight denoising module utilizes the same feature at each reverse sampling step. Evaluation of our method on both 3D and 2D3D registration tasks confirms its effectiveness.
Abstract:Efficiently finding optimal correspondences between point clouds is crucial for solving both rigid and non-rigid point cloud registration problems. Existing methods often rely on geometric or semantic feature embedding to establish correspondences and estimate transformations or flow fields. Recently, state-of-the-art methods have employed RAFT-like iterative updates to refine the solution. However, these methods have certain limitations. Firstly, their iterative refinement design lacks transparency, and their iterative updates follow a fixed path during the refinement process, which can lead to suboptimal results. Secondly, these methods overlook the importance of refining or optimizing correspondences (or matching matrices) as a precursor to solving transformations or flow fields. They typically compute candidate correspondences based on distances in the point feature space. However, they only project the candidate matching matrix into some matrix space once with Sinkhorn or dual softmax operations to obtain final correspondences. This one-shot projected matching matrix may be far from the globally optimal one, and these approaches do not consider the distribution of the target matching matrix. In this paper, we propose a novel approach that exploits the Denoising Diffusion Model to predict a searching gradient for the optimal matching matrix within the Doubly Stochastic Matrix Space. During the reverse denoising process, our method iteratively searches for better solutions along this denoising gradient, which points towards the maximum likelihood direction of the target matching matrix. Our method offers flexibility by allowing the search to start from any initial matching matrix provided by the online backbone or white noise. Experimental evaluations on the 3DMatch/3DLoMatch and 4DMatch/4DLoMatch datasets demonstrate the effectiveness of our newly designed framework.
Abstract:Point Cloud Registration (PCR) is a critical and challenging task in computer vision. One of the primary difficulties in PCR is identifying salient and meaningful points that exhibit consistent semantic and geometric properties across different scans. Previous methods have encountered challenges with ambiguous matching due to the similarity among patch blocks throughout the entire point cloud and the lack of consideration for efficient global geometric consistency. To address these issues, we propose a new framework that includes several novel techniques. Firstly, we introduce a semantic-aware geometric encoder that combines object-level and patch-level semantic information. This encoder significantly improves registration recall by reducing ambiguity in patch-level superpoint matching. Additionally, we incorporate a prior knowledge approach that utilizes an intrinsic shape signature to identify salient points. This enables us to extract the most salient super points and meaningful dense points in the scene. Secondly, we introduce an innovative transformer that encodes High-Order (HO) geometric features. These features are crucial for identifying salient points within initial overlap regions while considering global high-order geometric consistency. To optimize this high-order transformer further, we introduce an anchor node selection strategy. By encoding inter-frame triangle or polyhedron consistency features based on these anchor nodes, we can effectively learn high-order geometric features of salient super points. These high-order features are then propagated to dense points and utilized by a Sinkhorn matching module to identify key correspondences for successful registration. In our experiments conducted on well-known datasets such as 3DMatch/3DLoMatch and KITTI, our approach has shown promising results, highlighting the effectiveness of our novel method.
Abstract:Point Clouds Registration is a fundamental and challenging problem in 3D computer vision. It has been shown that the isometric transformation is an essential property in rigid point cloud registration, but the existing methods only utilize it in the outlier rejection stage. In this paper, we emphasize that the isometric transformation is also important in the feature learning stage for improving registration quality. We propose a \underline{G}raph \underline{M}atching \underline{O}ptimization based \underline{Net}work (denoted as GMONet for short), which utilizes the graph matching method to explicitly exert the isometry preserving constraints in the point feature learning stage to improve %refine the point representation. Specifically, we %use exploit the partial graph matching constraint to enhance the overlap region detection abilities of super points ($i.e.,$ down-sampled key points) and full graph matching to refine the registration accuracy at the fine-level overlap region. Meanwhile, we leverage the mini-batch sampling to improve the efficiency of the full graph matching optimization. Given high discriminative point features in the evaluation stage, we utilize the RANSAC approach to estimate the transformation between the scanned pairs. The proposed method has been evaluated on the 3DMatch/3DLoMatch benchmarks and the KITTI benchmark. The experimental results show that our method achieves competitive performance compared with the existing state-of-the-art baselines.
Abstract:Resource constraints, e.g. limited product inventory or product categories, may affect consumers' choices or preferences in some recommendation tasks, but are usually ignored in previous recommendation methods. In this paper, we aim to mine the cue of user preferences in resource-limited recommendation tasks, for which purpose we specifically build a largely used car transaction dataset possessing resource-limitation characteristics. Accordingly, we propose an interest-behaviour multiplicative network to predict the user's future interaction based on dynamic connections between users and items. To describe the user-item connection dynamically, mutually-recursive recurrent neural networks (MRRNNs) are introduced to capture interactive long-term dependencies, and meantime effective representations of users and items are obtained. To further take the resource limitation into consideration, a resource-limited branch is built to specifically explore the influence of resource variation caused by user behaviour for user preferences. Finally, mutual information is introduced to measure the similarity between the user action and fused features to predict future interaction, where the fused features come from both MRRNNs and resource-limited branches. We test the performance on the built used car transaction dataset as well as the Tmall dataset, and the experimental results verify the effectiveness of our framework.