Abstract:Place recognition is a key module for long-term SLAM systems. Current LiDAR-based place recognition methods are usually based on representations of point clouds such as unordered points or range images. These methods achieve high recall rates of retrieval, but their performance may degrade in the case of view variation or scene changes. In this work, we explore the potential of a different representation in place recognition, i.e. bird's eye view (BEV) images. We observe that the structural contents of BEV images are less influenced by rotations and translations of point clouds. We validate that, without any delicate design, a simple VGGNet trained on BEV images achieves comparable performance with the state-of-the-art place recognition methods in scenes of slight viewpoint changes. For more robust place recognition, we design a rotation-invariant network called BEVPlace. We use group convolution to extract rotation-equivariant local features from the images and NetVLAD for global feature aggregation. In addition, we observe that the distance between BEV features is correlated with the geometry distance of point clouds. Based on the observation, we develop a method to estimate the position of the query cloud, extending the usage of place recognition. The experiments conducted on large-scale public datasets show that our method 1) achieves state-of-the-art performance in terms of recall rates, 2) is robust to view changes, 3) shows strong generalization ability, and 4) can estimate the positions of query point clouds. Source code will be made publicly available at https://github.com/zjuluolun/BEVPlace.
Abstract:Place recognition plays a crucial role in re-localization and loop closure detection tasks for robots and vehicles. This paper seeks a well-defined global descriptor for LiDAR-based place recognition. Compared to local descriptors, global descriptors show remarkable performance in urban road scenes but are usually viewpoint-dependent. To this end, we propose a simple yet robust global descriptor dubbed FreSCo that decomposes the viewpoint difference during revisit and achieves both translation and rotation invariance by leveraging Fourier Transform and circular shift technique. Besides, a fast two-stage pose estimation method is proposed to estimate the relative pose after place retrieval by utilizing the compact 2D point cloud extracted from the scenes. Experiments show that FreSCo exhibited superior performance than contemporaneous methods on sequences of different scenes from multiple datasets. The code will be publicly available at https://github.com/soytony/FreSCo.