Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhendong Xiao

EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization

Feb 21, 2024

Zhendong Xiao, Changhao Chen, Shan Yang, Wu Wei

Figure 1 for EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization

Figure 2 for EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization

Figure 3 for EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization

Figure 4 for EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization

Abstract:Camera relocalization is pivotal in computer vision, with applications in AR, drones, robotics, and autonomous driving. It estimates 3D camera position and orientation (6-DoF) from images. Unlike traditional methods like SLAM, recent strides use deep learning for direct end-to-end pose estimation. We propose EffLoc, a novel efficient Vision Transformer for single-image camera relocalization. EffLoc's hierarchical layout, memory-bound self-attention, and feed-forward layers boost memory efficiency and inter-channel communication. Our introduced sequential group attention (SGA) module enhances computational efficiency by diversifying input features, reducing redundancy, and expanding model capacity. EffLoc excels in efficiency and accuracy, outperforming prior methods, such as AtLoc and MapNet. It thrives on large-scale outdoor car-driving scenario, ensuring simplicity, end-to-end trainability, and eliminating handcrafted loss functions.

* 8 pages, 6 figures, ICRA 2024 accepted

Via

Access Paper or Ask Questions