Abstract:In the field of autonomous driving or robotics, simultaneous localization and mapping (SLAM) and multi-object tracking (MOT) are two fundamental problems and are generally applied separately. Solutions to SLAM and MOT usually rely on certain assumptions, such as the static environment assumption for SLAM and the accurate ego-vehicle pose assumption for MOT. But in complex dynamic environments, it is difficult or even impossible to meet these assumptions. Therefore, the SLAMMOT, i.e., simultaneous localization, mapping, and moving object tracking, integrated system of SLAM and object tracking, has emerged for autonomous vehicles in dynamic environments. However, many conventional SLAMMOT solutions directly perform data association on the predictions and detections for object tracking, but ignore their quality. In practice, inaccurate predictions caused by continuous multi-frame missed detections in temporary occlusion scenarios, may degrade the performance of tracking, thereby affecting SLAMMOT. To address this challenge, this paper presents a LiDAR SLAMMOT based on confidence-guided data association (Conf SLAMMOT) method, which tightly couples the LiDAR SLAM and the confidence-guided data association based multi-object tracking into a graph optimization backend for estimating the state of the ego-vehicle and objects simultaneously. The confidence of prediction and detection are applied in the factor graph-based multi-object tracking for its data association, which not only avoids the performance degradation caused by incorrect initial assignments in some filter-based methods but also handles issues such as continuous missed detection in tracking while also improving the overall performance of SLAMMOT. Various comparative experiments demonstrate the superior advantages of Conf SLAMMOT, especially in scenes with some missed detections.
Abstract:The SLAMMOT, i.e. simultaneous localization, mapping, and moving object (detection and) tracking, represents an emerging technology for autonomous vehicles in dynamic environments. Such single-vehicle systems still have inherent limitations, such as occlusion issues. Inspired by SLAMMOT and rapidly evolving cooperative technologies, it is natural to explore cooperative simultaneous localization, mapping, moving object (detection and) tracking (C-SLAMMOT) to enhance state estimation for ego-vehicles and moving objects. C-SLAMMOT could significantly upgrade the single-vehicle performance by utilizing and integrating the shared information through communication among the multiple vehicles. This inevitably leads to a fundamental trade-off between performance and communication cost, especially in a scalable manner as the number of collaboration vehicles increases. To address this challenge, we propose a LiDAR-based communication-efficient C-SLAMMOT (CE C-SLAMMOT) method by determining the number of collaboration vehicles. In CE C-SLAMMOT, we adopt descriptor-based methods for enhancing ego-vehicle pose estimation and spatial confidence map-based methods for cooperative object perception, allowing for the continuous and dynamic selection of the corresponding critical collaboration vehicles and interaction content. This approach avoids the waste of precious communication costs by preventing the sharing of information from certain collaborative vehicles that may contribute little or no performance gain, compared to the baseline method of exchanging raw observation information among all vehicles. Comparative experiments in various aspects have confirmed that the proposed method achieves a good trade-off between performance and communication costs, while also outperforms previous state-of-the-art methods in cooperative perception performance.
Abstract:Accurate and efficient localization with conveniently-established map is the fundamental requirement for mobile robot operation in warehouse environments. An accurate AprilTag map can be conveniently established with the help of LiDAR-based SLAM. It is true that a LiDAR-based system is usually not commercially competitive in contrast with a vision-based system, yet fortunately for warehouse applications, only a single LiDAR-based SLAM system is needed to establish an accurate AprilTag map, whereas a large amount of visual localization systems can share this established AprilTag map for their own operations. Therefore, the cost of a LiDAR-based SLAM system is actually shared by the large amount of visual localization systems, and turns to be acceptable and even negligible for practical warehouse applications. Once an accurate AprilTag map is available, visual localization is realized as recursive estimation that fuses AprilTag measurements (i.e. AprilTag detection results) and robot motion data. AprilTag measurements may be nonlinear partial measurements; this can be handled by the well-known extended Kalman filter (EKF) in the spirit of local linearization. AprilTag measurements tend to have temporal correlation as well; however, this cannot be reasonably handled by the EKF. The split covariance intersection filter (Split CIF) is adopted to handle temporal correlation among AprilTag measurements. The Split CIF (in the spirit of local linearization) can also handle AprilTag nonlinear partial measurements. The Split CIF based visual localization system incorporates a measurement adaptive mechanism to handle outliers in AprilTag measurements and adopts a dynamic initialization mechanism to address the kidnapping problem. A comparative study in real warehouse environments demonstrates the potential and advantage of the Split CIF based visual localization solution.