Abstract:Simultaneous Localization and Mapping (SLAM) is a key component of autonomous systems operating in environments that require a consistent map for reliable localization. SLAM has been a widely studied topic for decades with most of the solutions being camera or LiDAR based. Early LiDAR-based approaches primarily relied on 2D data, whereas more recent frameworks use 3D data. In this work, we survey recent 3D LiDAR-based Graph-SLAM methods in urban environments, aiming to compare their strengths, weaknesses, and limitations. Additionally, we evaluate their robustness regarding the LiDAR resolution namely 64 $vs$ 128 channels. Regarding SLAM methods, we evaluate SC-LeGO-LOAM, SC-LIO-SAM, Cartographer, and HDL-Graph on real-world urban environments using the KITTI odometry dataset (a LiDAR with 64-channels only) and a new dataset (AUTONOMOS-LABS). The latter dataset, collected using instrumented vehicles driving in Berlin suburban area, comprises both 64 and 128 LiDARs. The experimental results are reported in terms of quantitative `metrics' and complemented by qualitative maps.
Abstract:3D LiDAR-based place recognition has been extensively researched in urban environments, yet it remains underexplored in agricultural settings. Unlike urban contexts, horticultural environments, characterized by their permeability to laser beams, result in sparse and overlapping LiDAR scans with suboptimal geometries. This phenomenon leads to intra- and inter-row descriptor ambiguity. In this work, we address this challenge by introducing SPVSoAP3D, a novel modeling approach that combines a voxel-based feature extraction network with an aggregation technique based on a second-order average pooling operator, complemented by a descriptor enhancement stage. Furthermore, we augment the existing HORTO-3DLM dataset by introducing two new sequences derived from horticultural environments. We evaluate the performance of SPVSoAP3D against state-of-the-art (SOTA) models, including OverlapTransformer, PointNetVLAD, and LOGG3D-Net, utilizing a cross-validation protocol on both the newly introduced sequences and the existing HORTO-3DLM dataset. The findings indicate that the average operator is more suitable for horticultural environments compared to the max operator and other first-order pooling techniques. Additionally, the results highlight the improvements brought by the descriptor enhancement stage.
Abstract:This paper addresses robotic place recognition in horticultural environments using 3D-LiDAR technology and deep learning. Three main contributions are proposed: (i) a novel model called PointNetPGAP, which combines a global average pooling aggregator and a pairwise feature interaction aggregator; (ii) a Segment-Level Consistency (SLC) model, used only during training, with the goal of augmenting the contrastive loss with a context-specific training signal to enhance descriptors; and (iii) a novel dataset named HORTO-3DLM featuring sequences from orchards and strawberry plantations. The experimental evaluation, conducted on the new HORTO-3DLM dataset, compares PointNetPGAP at the sequence- and segment-level with state-of-the-art (SOTA) models, including OverlapTransformer, PointNetVLAD, and LOGG3D. Additionally, all models were trained and evaluated using the SLC. Empirical results obtained through a cross-validation evaluation protocol demonstrate the superiority of PointNetPGAP compared to existing SOTA models. PointNetPGAP emerges as the best model in retrieving the top-1 candidate, outperforming PointNetVLAD (the second-best model). Moreover, when comparing the impact of training with the SLC model, performance increased on four out of the five evaluated models, indicating that adding a context-specific signal to the contrastive loss leads to improved descriptors.
Abstract:Robust and reliable place recognition and loop closure detection in agricultural environments is still an open problem. In particular, orchards are a difficult case study due to structural similarity across the entire field. In this work, we address the place recognition problem in orchards resorting to 3D LiDAR data, which is considered a key modality for robustness. Hence, we propose ORCHNet, a deep-learning-based approach that maps 3D-LiDAR scans to global descriptors. Specifically, this work proposes a new global feature aggregation approach, which fuses multiple aggregation methods into a robust global descriptor. ORCHNet is evaluated on real-world data collected in orchards, comprising data from the summer and autumn seasons. To assess the robustness, We compare ORCHNet with state-of-the-art aggregation approaches on data from the same season and across seasons. Moreover, we additionally evaluate the proposed approach as part of a localization framework, where ORCHNet is used as a loop closure detector. The empirical results indicate that, on the place recognition task, ORCHNet outperforms the remaining approaches, and is also more robust across seasons. As for the localization, the edge cases where the path goes through the trees are solved when integrating ORCHNet as a loop detector, showing the potential applicability of the proposed approach in this task. The code and dataset will be publicly available at:\url{https://github.com/Cybonic/ORCHNet.git}
Abstract:Digital agriculture has evolved significantly over the last few years due to the technological developments in automation and computational intelligence applied to the agricultural sector, including vineyards which are a relevant crop in the Mediterranean region. In this paper, a study of semantic segmentation for vine detection in real-world vineyards is presented by exploring state-of-the-art deep segmentation networks and conventional unsupervised methods. Camera data was collected on vineyards using an Unmanned Aerial System (UAS) equipped with a dual imaging sensor payload, namely a high-resolution color camera and a five-band multispectral and thermal camera. Extensive experiments of the segmentation networks and unsupervised methods have been performed on multimodal datasets representing three distinct vineyards located in the central region of Portugal. The reported results indicate that the best segmentation performances are obtained with deep networks, while traditional (non-deep) approaches using the NIR band shown competitive results. The results also show that multimodality slightly improves the performance of vine segmentation but the NIR spectrum alone generally is sufficient on most of the datasets. The code and dataset are publicly available on https://github.com/Cybonic/DL_vineyard_segmentation_study.git