Abstract:Current visual SLAM systems face significant challenges in balancing computational efficiency with robust loop closure handling. Traditional approaches require careful manual tuning and incur substantial computational overhead, while learning-based methods either lack explicit loop closure capabilities or implement them through computationally expensive methods. We present AutoLoop, a novel approach that combines automated curriculum learning with efficient fine-tuning for visual SLAM systems. Our method employs a DDPG (Deep Deterministic Policy Gradient) agent to dynamically adjust loop closure weights during training, eliminating the need for manual hyperparameter search while significantly reducing the required training steps. The approach pre-computes potential loop closure pairs offline and leverages them through an agent-guided curriculum, allowing the model to adapt efficiently to new scenarios. Experiments conducted on TartanAir for training and validated across multiple benchmarks including KITTI, EuRoC, ICL-NUIM and TUM RGB-D demonstrate that AutoLoop achieves comparable or superior performance while reducing training time by an order of magnitude compared to traditional approaches. AutoLoop provides a practical solution for rapid adaptation of visual SLAM systems, automating the weight tuning process that traditionally requires multiple manual iterations. Our results show that this automated curriculum strategy not only accelerates training but also maintains or improves the model's performance across diverse environmental conditions.
Abstract:For many practical applications, a high computational cost of inference over deep network architectures might be unacceptable. A small degradation in the overall inference accuracy might be a reasonable price to pay for a significant reduction in the required computational resources. In this work, we describe a method for introducing "shortcuts" into the DNN feedforward inference process by skipping costly feedforward computations whenever possible. The proposed method is based on the previously described BranchyNet (Teerapittayanon et al., 2016) and the EEnet (Demir, 2019) architectures that jointly train the main network and early exit branches. We extend those methods by attaching branches to pre-trained models and, thus, eliminating the need to alter the original weights of the network. We also suggest a new branch architecture based on convolutional building blocks to allow enough training capacity when applied on large DNNs. The proposed architecture includes confidence heads that are used for predicting the confidence level in the corresponding early exits. By defining adjusted thresholds on these confidence extensions, we can control in real-time the amount of data exiting from each branch and the overall tradeoff between speed and accuracy of our model. In our experiments, we evaluate our method using image datasets (SVHN and CIFAR10) and several DNN architectures (ResNet, DenseNet, VGG) with varied depth. Our results demonstrate that the proposed method enables us to reduce the average inference computational cost and further controlling the tradeoff between the model accuracy and the computation cost.
Abstract:Curriculum Learning (CL), drawing inspiration from natural learning patterns observed in humans and animals, employs a systematic approach of gradually introducing increasingly complex training data during model development. Our work applies innovative CL methodologies to address the challenging geometric problem of monocular Visual Odometry (VO) estimation, which is essential for robot navigation in constrained environments. The primary objective of our research is to push the boundaries of current state-of-the-art (SOTA) benchmarks in monocular VO by investigating various curriculum learning strategies. We enhance the end-to-end Deep-Patch-Visual Odometry (DPVO) framework through the integration of novel CL approaches, with the goal of developing more resilient models capable of maintaining high performance across challenging environments and complex motion scenarios. Our research encompasses several distinctive CL strategies. We develop methods to evaluate sample difficulty based on trajectory motion characteristics, implement sophisticated adaptive scheduling through self-paced weighted loss mechanisms, and utilize reinforcement learning agents for dynamic adjustment of training emphasis. Through comprehensive evaluation on the real-world TartanAir dataset, our Curriculum Learning-based Deep-Patch-Visual Odometry (CL-DPVO) demonstrates superior performance compared to existing SOTA methods, including both feature-based and learning-based VO approaches. The results validate the effectiveness of integrating curriculum learning principles into visual odometry systems.