Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zixin Luo

Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

May 22, 2024

Hongkai Chen, Zixin Luo, Yurun Tian, Xuyang Bai, Ziyu Wang, Lei Zhou, Mingmin Zhen, Tian Fang, David McKinnon, Yanghai Tsin(+1 more)

Figure 1 for Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

Figure 2 for Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

Figure 3 for Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

Figure 4 for Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

Abstract:Identifying robust and accurate correspondences across images is a fundamental problem in computer vision that enables various downstream tasks. Recent semi-dense matching methods emphasize the effectiveness of fusing relevant cross-view information through Transformer. In this paper, we propose several improvements upon this paradigm. Firstly, we introduce affine-based local attention to model cross-view deformations. Secondly, we present selective fusion to merge local and global messages from cross attention. Apart from network structure, we also identify the importance of enforcing spatial smoothness in loss design, which has been omitted by previous works. Based on these augmentations, our network demonstrate strong matching capacity under different settings. The full version of our network achieves state-of-the-art performance among semi-dense matching methods at a similar cost to LoFTR, while the slim version reaches LoFTR baseline's performance with only 15% computation cost and 18% parameters.

* Accepted to CVPR2024 Image Matching Workshop

Via

Access Paper or Ask Questions

ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Aug 30, 2022

Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, David Mckinnon, Yanghai Tsin, Long Quan

Figure 1 for ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Figure 2 for ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Figure 3 for ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Figure 4 for ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Abstract:Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications. To capture context at both global and local granularity, we propose ASpanFormer, a Transformer-based detector-free matcher that is built on hierarchical attention structure, adopting a novel attention operation which is capable of adjusting attention span in a self-adaptive manner. To achieve this goal, first, flow maps are regressed in each cross attention phase to locate the center of search region. Next, a sampling grid is generated around the center, whose size, instead of being empirically configured as fixed, is adaptively computed from a pixel uncertainty estimated along with the flow map. Finally, attention is computed across two images within derived regions, referred to as attention span. By these means, we are able to not only maintain long-range dependencies, but also enable fine-grained attention among pixels of high relevance that compensates essential locality and piece-wise smoothness in matching tasks. State-of-the-art accuracy on a wide range of evaluation benchmarks validates the strong matching capability of our method.

* Accepted to ECCV2022, project page at https://aspanformer.github.io/

Via

Access Paper or Ask Questions

Learning to Match Features with Seeded Graph Matching Network

Aug 19, 2021

Hongkai Chen, Zixin Luo, Jiahui Zhang, Lei Zhou, Xuyang Bai, Zeyu Hu, Chiew-Lan Tai, Long Quan

Figure 1 for Learning to Match Features with Seeded Graph Matching Network

Figure 2 for Learning to Match Features with Seeded Graph Matching Network

Figure 3 for Learning to Match Features with Seeded Graph Matching Network

Figure 4 for Learning to Match Features with Seeded Graph Matching Network

Abstract:Matching local features across images is a fundamental problem in computer vision. Targeting towards high accuracy and efficiency, we propose Seeded Graph Matching Network, a graph neural network with sparse structure to reduce redundant connectivity and learn compact representation. The network consists of 1) Seeding Module, which initializes the matching by generating a small set of reliable matches as seeds. 2) Seeded Graph Neural Network, which utilizes seed matches to pass messages within/across images and predicts assignment costs. Three novel operations are proposed as basic elements for message passing: 1) Attentional Pooling, which aggregates keypoint features within the image to seed matches. 2) Seed Filtering, which enhances seed features and exchanges messages across images. 3) Attentional Unpooling, which propagates seed features back to original keypoints. Experiments show that our method reduces computational and memory complexity significantly compared with typical attention-based networks while competitive or higher performance is achieved.

* Accepted by ICCV2021, code to be realeased at https://github.com/vdvchen/SGMNet

Via

Access Paper or Ask Questions

PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Mar 09, 2021

Xuyang Bai, Zixin Luo, Lei Zhou, Hongkai Chen, Lei Li, Zeyu Hu, Hongbo Fu, Chiew-Lan Tai

Figure 1 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Figure 2 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Figure 3 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Figure 4 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Abstract:Removing outlier correspondences is one of the critical steps for successful feature-based point cloud registration. Despite the increasing popularity of introducing deep learning methods in this field, spatial consistency, which is essentially established by a Euclidean transformation between point clouds, has received almost no individual attention in existing learning frameworks. In this paper, we present PointDSC, a novel deep neural network that explicitly incorporates spatial consistency for pruning outlier correspondences. First, we propose a nonlocal feature aggregation module, weighted by both feature and spatial coherence, for feature embedding of the input correspondences. Second, we formulate a differentiable spectral matching module, supervised by pairwise spatial compatibility, to estimate the inlier confidence of each correspondence from the embedded features. With modest computation cost, our method outperforms the state-of-the-art hand-crafted and learning-based outlier rejection approaches on several real-world datasets by a significant margin. We also show its wide applicability by combining PointDSC with different 3D local descriptors.

* Accepted to CVPR 2021, supplementary materials included

Via

Access Paper or Ask Questions

Visibility-aware Multi-view Stereo Network

Aug 19, 2020

Jingyang Zhang, Yao Yao, Shiwei Li, Zixin Luo, Tian Fang

Figure 1 for Visibility-aware Multi-view Stereo Network

Figure 2 for Visibility-aware Multi-view Stereo Network

Figure 3 for Visibility-aware Multi-view Stereo Network

Figure 4 for Visibility-aware Multi-view Stereo Network

Abstract:Learning-based multi-view stereo (MVS) methods have demonstrated promising results. However, very few existing networks explicitly take the pixel-wise visibility into consideration, resulting in erroneous cost aggregation from occluded pixels. In this paper, we explicitly infer and integrate the pixel-wise occlusion information in the MVS network via the matching uncertainty estimation. The pair-wise uncertainty map is jointly inferred with the pair-wise depth map, which is further used as weighting guidance during the multi-view cost volume fusion. As such, the adverse influence of occluded pixels is suppressed in the cost fusion. The proposed framework Vis-MVSNet significantly improves depth accuracies in the scenes with severe occlusion. Extensive experiments are performed on DTU, BlendedMVS, and Tanks and Temples datasets to justify the effectiveness of the proposed framework.

* Accepted to BMVC 2020

Via

Access Paper or Ask Questions

Learning Stereo Matchability in Disparity Regression Networks

Aug 11, 2020

Jingyang Zhang, Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, Long Quan

Figure 1 for Learning Stereo Matchability in Disparity Regression Networks

Figure 2 for Learning Stereo Matchability in Disparity Regression Networks

Figure 3 for Learning Stereo Matchability in Disparity Regression Networks

Figure 4 for Learning Stereo Matchability in Disparity Regression Networks

Abstract:Learning-based stereo matching has recently achieved promising results, yet still suffers difficulties in establishing reliable matches in weakly matchable regions that are textureless, non-Lambertian, or occluded. In this paper, we address this challenge by proposing a stereo matching network that considers pixel-wise matchability. Specifically, the network jointly regresses disparity and matchability maps from 3D probability volume through expectation and entropy operations. Next, a learned attenuation is applied as the robust loss function to alleviate the influence of weakly matchable pixels in the training. Finally, a matchability-aware disparity refinement is introduced to improve the depth inference in weakly matchable regions. The proposed deep stereo matchability (DSM) framework can improve the matching result or accelerate the computation while still guaranteeing the quality. Moreover, the DSM framework is portable to many recent stereo networks. Extensive experiments are conducted on Scene Flow and KITTI stereo datasets to demonstrate the effectiveness of the proposed framework over the state-of-the-art learning-based stereo methods.

* Accepted to ICPR 2020

Via

Access Paper or Ask Questions

Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

Aug 02, 2020

Lei Zhou, Zixin Luo, Mingmin Zhen, Tianwei Shen, Shiwei Li, Zhuofei Huang, Tian Fang, Long Quan

Figure 1 for Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

Figure 2 for Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

Figure 3 for Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

Figure 4 for Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

Abstract:Current bundle adjustment solvers such as the Levenberg-Marquardt (LM) algorithm are limited by the bottleneck in solving the Reduced Camera System (RCS) whose dimension is proportional to the camera number. When the problem is scaled up, this step is neither efficient in computation nor manageable for a single compute node. In this work, we propose a stochastic bundle adjustment algorithm which seeks to decompose the RCS approximately inside the LM iterations to improve the efficiency and scalability. It first reformulates the quadratic programming problem of an LM iteration based on the clustering of the visibility graph by introducing the equality constraints across clusters. Then, we propose to relax it into a chance constrained problem and solve it through sampled convex program. The relaxation is intended to eliminate the interdependence between clusters embodied by the constraints, so that a large RCS can be decomposed into independent linear sub-problems. Numerical experiments on unordered Internet image sets and sequential SLAM image sets, as well as distributed experiments on large-scale datasets, have demonstrated the high efficiency and scalability of the proposed approach. Codes are released at https://github.com/zlthinker/STBA.

* Accepted by ECCV 2020

Via

Access Paper or Ask Questions

ASLFeat: Learning Local Features of Accurate Shape and Localization

Apr 19, 2020

Zixin Luo, Lei Zhou, Xuyang Bai, Hongkai Chen, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, Long Quan

Figure 1 for ASLFeat: Learning Local Features of Accurate Shape and Localization

Figure 2 for ASLFeat: Learning Local Features of Accurate Shape and Localization

Figure 3 for ASLFeat: Learning Local Features of Accurate Shape and Localization

Figure 4 for ASLFeat: Learning Local Features of Accurate Shape and Localization

Abstract:This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors. First, the ability to estimate the local shape (scale, orientation, etc.) of feature points is often neglected during dense feature extraction, while the shape-awareness is crucial to acquire stronger geometric invariance. Second, the localization accuracy of detected keypoints is not sufficient to reliably recover camera geometry, which has become the bottleneck in tasks such as 3D reconstruction. In this paper, we present ASLFeat, with three light-weight yet effective modifications to mitigate above issues. First, we resort to deformable convolutional networks to densely estimate and apply local transformation. Second, we take advantage of the inherent feature hierarchy to restore spatial resolution and low-level details for accurate keypoint localization. Finally, we use a peakiness measurement to relate feature responses and derive more indicative detection scores. The effect of each modification is thoroughly studied, and the evaluation is extensively conducted across a variety of practical scenarios. State-of-the-art results are reported that demonstrate the superiority of our methods.

* Accepted to CVPR 2020, supplementary materials included, code available

Via

Access Paper or Ask Questions

KFNet: Learning Temporal Camera Relocalization using Kalman Filtering

Mar 24, 2020

Lei Zhou, Zixin Luo, Tianwei Shen, Jiahui Zhang, Mingmin Zhen, Yao Yao, Tian Fang, Long Quan

Figure 1 for KFNet: Learning Temporal Camera Relocalization using Kalman Filtering

Figure 2 for KFNet: Learning Temporal Camera Relocalization using Kalman Filtering

Figure 3 for KFNet: Learning Temporal Camera Relocalization using Kalman Filtering

Figure 4 for KFNet: Learning Temporal Camera Relocalization using Kalman Filtering

Abstract:Temporal camera relocalization estimates the pose with respect to each video frame in sequence, as opposed to one-shot relocalization which focuses on a still image. Even though the time dependency has been taken into account, current temporal relocalization methods still generally underperform the state-of-the-art one-shot approaches in terms of accuracy. In this work, we improve the temporal relocalization method by using a network architecture that incorporates Kalman filtering (KFNet) for online camera relocalization. In particular, KFNet extends the scene coordinate regression problem to the time domain in order to recursively establish 2D and 3D correspondences for the pose determination. The network architecture design and the loss formulation are based on Kalman filtering in the context of Bayesian learning. Extensive experiments on multiple relocalization benchmarks demonstrate the high accuracy of KFNet at the top of both one-shot and temporal relocalization approaches. Our codes are released at https://github.com/zlthinker/KFNet.

* An oral paper of CVPR 2020

Via

Access Paper or Ask Questions

D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Mar 06, 2020

Xuyang Bai, Zixin Luo, Lei Zhou, Hongbo Fu, Long Quan, Chiew-Lan Tai

Figure 1 for D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Figure 2 for D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Figure 3 for D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Figure 4 for D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Abstract:A successful point cloud registration often lies on robust establishment of sparse matches through discriminative 3D local features. Despite the fast evolution of learning-based 3D feature descriptors, little attention has been drawn to the learning of 3D feature detectors, even less for a joint learning of the two tasks. In this paper, we leverage a 3D fully convolutional network for 3D point clouds, and propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point. In particular, we propose a keypoint selection strategy that overcomes the inherent density variations of 3D point clouds, and further propose a self-supervised detector loss guided by the on-the-fly feature matching results during training. Finally, our method achieves state-of-the-art results in both indoor and outdoor scenarios, evaluated on 3DMatch and KITTI datasets, and shows its strong generalization ability on the ETH dataset. Towards practical use, we show that by adopting a reliable feature detector, sampling a smaller number of features is sufficient to achieve accurate and fast point cloud alignment.[code release](https://github.com/XuyangBai/D3Feat)

* Accepted to CVPR 2020, supplementary materials included

Via

Access Paper or Ask Questions