Abstract:Local feature matching is an essential technique in image matching and plays a critical role in a wide range of vision-based applications. However, existing Transformer-based detector-free local feature matching methods encounter challenges due to the quadratic computational complexity of attention mechanisms, especially at high resolutions. However, while existing Transformer-based detector-free local feature matching methods have reduced computational costs using linear attention mechanisms, they still struggle to capture detailed local interactions, which affects the accuracy and robustness of precise local correspondences. In order to enhance representations of attention mechanisms while preserving low computational complexity, we propose the LoFLAT, a novel Local Feature matching using Focused Linear Attention Transformer in this paper. Our LoFLAT consists of three main modules: the Feature Extraction Module, the Feature Transformer Module, and the Matching Module. Specifically, the Feature Extraction Module firstly uses ResNet and a Feature Pyramid Network to extract hierarchical features. The Feature Transformer Module further employs the Focused Linear Attention to refine attention distribution with a focused mapping function and to enhance feature diversity with a depth-wise convolution. Finally, the Matching Module predicts accurate and robust matches through a coarse-to-fine strategy. Extensive experimental evaluations demonstrate that the proposed LoFLAT outperforms the LoFTR method in terms of both efficiency and accuracy.
Abstract:Disparity prediction from stereo images is essential to computer vision applications including autonomous driving, 3D model reconstruction, and object detection. To predict accurate disparity map, we propose a novel deep learning architecture for detectingthe disparity map from a rectified pair of stereo images, called MSDC-Net. Our MSDC-Net contains two modules: multi-scale fusion 2D convolution and multi-scale residual 3D convolution modules. The multi-scale fusion 2D convolution module exploits the potential multi-scale features, which extracts and fuses the different scale features by Dense-Net. The multi-scale residual 3D convolution module learns the different scale geometry context from the cost volume which aggregated by the multi-scale fusion 2D convolution module. Experimental results on Scene Flow and KITTI datasets demonstrate that our MSDC-Net significantly outperforms other approaches in the non-occluded region.
Abstract:It is challenging to design a high speed tracking approach using l1-norm due to its non-differentiability. In this paper, a new kernelized correlation filter is introduced by leveraging the sparsity attribute of l1-norm based regularization to design a high speed tracker. We combine the l1-norm and l2-norm based regularizations in one Huber-type loss function, and then formulate an optimization problem in the Fourier Domain for fast computation, which enables the tracker to adaptively ignore the noisy features produced from occlusion and illumination variation, while keep the advantages of l2-norm based regression. This is achieved due to the attribute of Convolution Theorem that the correlation in spatial domain corresponds to an element-wise product in the Fourier domain, resulting in that the l1-norm optimization problem could be decomposed into multiple sub-optimization spaces in the Fourier domain. But the optimized variables in the Fourier domain are complex, which makes using the l1-norm impossible if the real and imaginary parts of the variables cannot be separated. However, our proposed optimization problem is formulated in such a way that their real part and imaginary parts are indeed well separated. As such, the proposed optimization problem can be solved efficiently to obtain their optimal values independently with closed-form solutions. Extensive experiments on two large benchmark datasets demonstrate that the proposed tracking algorithm significantly improves the tracking accuracy of the original kernelized correlation filter (KCF) while with little sacrifice on tracking speed. Moreover, it outperforms the state-of-the-art approaches in terms of accuracy, efficiency, and robustness.