https://github.com/mingyuyng/Visual-Selective-VIO
In recent years, deep learning-based approaches for visual-inertial odometry (VIO) have shown remarkable performance outperforming traditional geometric methods. Yet, all existing methods use both the visual and inertial measurements for every pose estimation incurring potential computational redundancy. While visual data processing is much more expensive than that for the inertial measurement unit (IMU), it may not always contribute to improving the pose estimation accuracy. In this paper, we propose an adaptive deep-learning based VIO method that reduces computational redundancy by opportunistically disabling the visual modality. Specifically, we train a policy network that learns to deactivate the visual feature extractor on the fly based on the current motion state and IMU readings. A Gumbel-Softmax trick is adopted to train the policy network to make the decision process differentiable for end-to-end system training. The learned strategy is interpretable, and it shows scenario-dependent decision patterns for adaptive complexity reduction. Experiment results show that our method achieves a similar or even better performance than the full-modality baseline with up to 78.8% computational complexity reduction for KITTI dataset evaluation. Our code will be shared in