Abstract:Accurate and reliable positioning is crucial for perception, decision-making, and other high-level applications in autonomous driving, unmanned aerial vehicles, and intelligent robots. Given the inherent limitations of standalone sensors, integrating heterogeneous sensors with complementary capabilities is one of the most effective approaches to achieving this goal. In this paper, we propose a filtering-based, tightly coupled global navigation satellite system (GNSS)-visual-inertial positioning framework with a pose-only formulation applied to the visual-inertial system (VINS), termed PO-GVINS. Specifically, multiple-view imaging used in current VINS requires a priori of 3D feature, then jointly estimate camera poses and 3D feature position, which inevitably introduces linearization error of the feature as well as facing dimensional explosion. However, the pose-only (PO) formulation, which is demonstrated to be equivalent to the multiple-view imaging and has been applied in visual reconstruction, represent feature depth using two camera poses and thus 3D feature position is removed from state vector avoiding aforementioned difficulties. Inspired by this, we first apply PO formulation in our VINS, i.e., PO-VINS. GNSS raw measurements are then incorporated with integer ambiguity resolved to achieve accurate and drift-free estimation. Extensive experiments demonstrate that the proposed PO-VINS significantly outperforms the multi-state constrained Kalman filter (MSCKF). By incorporating GNSS measurements, PO-GVINS achieves accurate, drift-free state estimation, making it a robust solution for positioning in challenging environments.
Abstract:The essential of navigation, perception, and decision-making which are basic tasks for intelligent robots, is to estimate necessary system states. Among them, navigation is fundamental for other upper applications, providing precise position and orientation, by integrating measurements from multiple sensors. With observations of each sensor appropriately modelled, multi-sensor fusion tasks for navigation are reduced to the state estimation problem which can be solved by two approaches: optimization and filtering. Recent research has shown that optimization-based frameworks outperform filtering-based ones in terms of accuracy. However, both methods are based on maximum likelihood estimation (MLE) and should be theoretically equivalent with the same linearization points, observation model, measurements, and Gaussian noise assumption. In this paper, we deeply dig into the theories and existing strategies utilized in both optimization-based and filtering-based approaches. It is demonstrated that the two methods are equal theoretically, but this equivalence corrupts due to different strategies applied in real-time operation. By adjusting existing strategies of the filtering-based approaches, the Monte-Carlo simulation and vehicular ablation experiments based on visual odometry (VO) indicate that the strategy adjusted filtering strictly equals to optimization. Therefore, future research on sensor-fusion problems should concentrate on their own algorithms and strategies rather than state estimation approaches.