Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yipu Zhao

Georgia Institute of Technology

OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB

Oct 09, 2024

Yunzhi Lin, Yipu Zhao, Fu-Jen Chu, Xingyu Chen, Weiyao Wang, Hao Tang, Patricio A. Vela, Matt Feiszli, Kevin Liang

Figure 1 for OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB

Figure 2 for OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB

Figure 3 for OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB

Figure 4 for OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB

Abstract:To address the challenge of short-term object pose tracking in dynamic environments with monocular RGB input, we introduce a large-scale synthetic dataset OmniPose6D, crafted to mirror the diversity of real-world conditions. We additionally present a benchmarking framework for a comprehensive comparison of pose tracking algorithms. We propose a pipeline featuring an uncertainty-aware keypoint refinement network, employing probabilistic modeling to refine pose estimation. Comparative evaluations demonstrate that our approach achieves performance superior to existing baselines on real datasets, underscoring the effectiveness of our synthetic dataset and refinement technique in enhancing tracking precision in dynamic contexts. Our contributions set a new precedent for the development and assessment of object pose tracking methodologies in complex scenes.

* 13 pages, 9 figures

Via

Access Paper or Ask Questions

Long-term Visual Map Sparsification with Heterogeneous GNN

Mar 29, 2022

Ming-Fang Chang, Yipu Zhao, Rajvi Shah, Jakob J. Engel, Michael Kaess, Simon Lucey

Figure 1 for Long-term Visual Map Sparsification with Heterogeneous GNN

Figure 2 for Long-term Visual Map Sparsification with Heterogeneous GNN

Figure 3 for Long-term Visual Map Sparsification with Heterogeneous GNN

Figure 4 for Long-term Visual Map Sparsification with Heterogeneous GNN

Abstract:We address the problem of map sparsification for long-term visual localization. For map sparsification, a commonly employed assumption is that the pre-build map and the later captured localization query are consistent. However, this assumption can be easily violated in the dynamic world. Additionally, the map size grows as new data accumulate through time, causing large data overhead in the long term. In this paper, we aim to overcome the environmental changes and reduce the map size at the same time by selecting points that are valuable to future localization. Inspired by the recent progress in Graph Neural Network(GNN), we propose the first work that models SfM maps as heterogeneous graphs and predicts 3D point importance scores with a GNN, which enables us to directly exploit the rich information in the SfM map graph. Two novel supervisions are proposed: 1) a data-fitting term for selecting valuable points to future localization based on training queries; 2) a K-Cover term for selecting sparse points with full map coverage. The experiments show that our method selected map points on stable and widely visible structures and outperformed baselines in localization performance.

* Accepted by CVPR 2022

Via

Access Paper or Ask Questions

Distributed Client-Server Optimization for SLAM with Limited On-Device Resources

Mar 26, 2021

Yetong Zhang, Ming Hsiao, Yipu Zhao, Jing Dong, Jakob J. Enge

Figure 1 for Distributed Client-Server Optimization for SLAM with Limited On-Device Resources

Figure 2 for Distributed Client-Server Optimization for SLAM with Limited On-Device Resources

Figure 3 for Distributed Client-Server Optimization for SLAM with Limited On-Device Resources

Figure 4 for Distributed Client-Server Optimization for SLAM with Limited On-Device Resources

Abstract:Simultaneous localization and mapping (SLAM) is a crucial functionality for exploration robots and virtual/augmented reality (VR/AR) devices. However, some of such devices with limited resources cannot afford the computational or memory cost to run full SLAM algorithms. We propose a general client-server SLAM optimization framework that achieves accurate real-time state estimation on the device with low requirements of on-board resources. The resource-limited device (the client) only works on a small part of the map, and the rest of the map is processed by the server. By sending the summarized information of the rest of map to the client, the on-device state estimation is more accurate. Further improvement of accuracy is achieved in the presence of on-device early loop closures, which enables reloading useful variables from the server to the client. Experimental results from both synthetic and real-world datasets demonstrate that the proposed optimization framework achieves accurate estimation in real-time with limited computation and memory budget of the device.

* accepted in ICRA 2021

Via

Access Paper or Ask Questions

Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

Mar 06, 2021

Shiyu Feng, Zixuan Wu, Yipu Zhao, Patricio A. Vela

Figure 1 for Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

Figure 2 for Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

Figure 3 for Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

Figure 4 for Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

Abstract:This paper describes an image based visual servoing (IBVS) system for a nonholonomic robot to achieve good trajectory following without real-time robot pose information and without a known visual map of the environment. We call it trajectory servoing. The critical component is a feature-based, indirect SLAM method to provide a pool of available features with estimated depth, so that they may be propagated forward in time to generate image feature trajectories for visual servoing. Short and long distance experiments show the benefits of trajectory servoing for navigating unknown areas without absolute positioning. Trajectory servoing is shown to be more accurate than pose-based feedback when both rely on the same underlying SLAM system.

Via

Access Paper or Ask Questions

Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

Aug 23, 2020

Yipu Zhao, Justin S. Smith, Patricio A. Vela

Figure 1 for Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

Figure 2 for Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

Figure 3 for Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

Figure 4 for Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

Abstract:The cost-efficiency of visual(-inertial) SLAM (VSLAM) is a critical characteristic of resource-limited applications. While hardware and algorithm advances have been significantly improved the cost-efficiency of VSLAM front-ends, the cost-efficiency of VSLAM back-ends remains a bottleneck. This paper describes a novel, rigorous method to improve the cost-efficiency of local BA in a BA-based VSLAM back-end. An efficient algorithm, called Good Graph, is developed to select size-reduced graphs optimized in local BA with condition preservation. To better suit BA-based VSLAM back-ends, the Good Graph predicts future estimation needs, dynamically assigns an appropriate size budget, and selects a condition-maximized subgraph for BA estimation. Evaluations are conducted on two scenarios: 1) VSLAM as standalone process, and 2) VSLAM as part of closed-loop navigation system. Results from the first scenario show Good Graph improves accuracy and robustness of VSLAM estimation, when computational limits exist. Results from the second scenario, indicate that Good Graph benefits the trajectory tracking performance of VSLAM-based closed-loop navigation systems, which is a primary application of VSLAM.

* 20 pages, 14 figures, 8 tables. Submitted to IEEE Transactions on Robotics, for the provided open-source software see https://github.com/ivalab/gf_orb_slam2

Via

Access Paper or Ask Questions

Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Mar 07, 2020

Yipu Zhao, Justin S. Smith, Sambhu H. Karumanchi, Patricio A. Vela

Figure 1 for Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Figure 2 for Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Figure 3 for Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Figure 4 for Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Abstract:Visual-inertial SLAM is essential for robot navigation in GPS-denied environments, e.g. indoor, underground. Conventionally, the performance of visual-inertial SLAM is evaluated with open-loop analysis, with a focus on the drift level of SLAM systems. In this paper, we raise the question on the importance of visual estimation latency in closed-loop navigation tasks, such as accurate trajectory tracking. To understand the impact of both drift and latency on visual-inertial SLAM systems, a closed-loop benchmarking simulation is conducted, where a robot is commanded to follow a desired trajectory using the feedback from visual-inertial estimation. By extensively evaluating the trajectory tracking performance of representative state-of-the-art visual-inertial SLAM systems, we reveal the importance of latency reduction in visual estimation module of these systems. The findings suggest directions of future improvements for visual-inertial SLAM.

* 8 pages, 7 figures. Accepted for publication in ICRA 2020

Via

Access Paper or Ask Questions

Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency

Jan 03, 2020

Yipu Zhao, Patricio A. Vela

Figure 1 for Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency

Figure 2 for Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency

Figure 3 for Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency

Figure 4 for Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency

Abstract:Analysis of state-of-the-art VO/VSLAM system exposes a gap in balancing performance (accuracy & robustness) and efficiency (latency). Feature-based systems exhibit good performance, yet have higher latency due to explicit data association; direct & semidirect systems have lower latency, but are inapplicable in some target scenarios or exhibit lower accuracy than feature-based ones. This paper aims to fill the performance-efficiency gap with an enhancement applied to feature-based VSLAM. We present good feature matching, an active map-to-frame feature matching method. Feature matching effort is tied to submatrix selection, which has combinatorial time complexity and requires choosing a scoring metric. Via simulation, the Max-logDet matrix revealing metric is shown to perform best. For real-time applicability, the combination of deterministic selection and randomized acceleration is studied. The proposed algorithm is integrated into monocular & stereo feature-based VSLAM systems. Extensive evaluations on multiple benchmarks and compute hardware quantify the latency reduction and the accuracy & robustness preservation.

* Accepted as a Regular Paper to the IEEE Transactions on Robotics Journal

Via

Access Paper or Ask Questions

Autonomous, Monocular, Vision-Based Snake Robot Navigation and Traversal of Cluttered Environments using Rectilinear Gait Motion

Aug 19, 2019

Alexander H. Chang, Shiyu Feng, Yipu Zhao, Justin S. Smith, Patricio A. Vela

Figure 1 for Autonomous, Monocular, Vision-Based Snake Robot Navigation and Traversal of Cluttered Environments using Rectilinear Gait Motion

Figure 2 for Autonomous, Monocular, Vision-Based Snake Robot Navigation and Traversal of Cluttered Environments using Rectilinear Gait Motion

Figure 3 for Autonomous, Monocular, Vision-Based Snake Robot Navigation and Traversal of Cluttered Environments using Rectilinear Gait Motion

Figure 4 for Autonomous, Monocular, Vision-Based Snake Robot Navigation and Traversal of Cluttered Environments using Rectilinear Gait Motion

Abstract:Rectilinear forms of snake-like robotic locomotion are anticipated to be an advantage in obstacle-strewn scenarios characterizing urban disaster zones, subterranean collapses, and other natural environments. The elongated, laterally-narrow footprint associated with these motion strategies is well-suited to traversal of confined spaces and narrow pathways. Navigation and path planning in the absence of global sensing, however, remains a pivotal challenge to be addressed prior to practical deployment of these robotic mechanisms. Several challenges related to visual processing and localization need to be resolved to to enable navigation. As a first pass in this direction, we equip a wireless, monocular color camera to the head of a robotic snake. Visiual odometry and mapping from ORB-SLAM permits self-localization in planar, obstacle-strewn environments. Ground plane traversability segmentation in conjunction with perception-space collision detection permits path planning for navigation. A previously presented dynamical reduction of rectilinear snake locomotion to a non-holonomic kinematic vehicle informs both SLAM and planning. The simplified motion model is then applied to track planned trajectories through an obstacle configuration. This navigational framework enables a snake-like robotic platform to autonomously navigate and traverse unknown scenarios with only monocular vision.

Via

Access Paper or Ask Questions

Characterizing SLAM Benchmarks and Methods for the Robust Perception Age

May 19, 2019

Wenkai Ye, Yipu Zhao, Patricio A. Vela

Figure 1 for Characterizing SLAM Benchmarks and Methods for the Robust Perception Age

Figure 2 for Characterizing SLAM Benchmarks and Methods for the Robust Perception Age

Figure 3 for Characterizing SLAM Benchmarks and Methods for the Robust Perception Age

Figure 4 for Characterizing SLAM Benchmarks and Methods for the Robust Perception Age

Abstract:The diversity of SLAM benchmarks affords extensive testing of SLAM algorithms to understand their performance, individually or in relative terms. The ad-hoc creation of these benchmarks does not necessarily illuminate the particular weak points of a SLAM algorithm when performance is evaluated. In this paper, we propose to use a decision tree to identify challenging benchmark properties for state-of-the-art SLAM algorithms and important components within the SLAM pipeline regarding their ability to handle these challenges. Establishing what factors of a particular sequence lead to track failure or degradation relative to these characteristics is important if we are to arrive at a strong understanding for the core computational needs of a robust SLAM algorithm. Likewise, we argue that it is important to profile the computational performance of the individual SLAM components for use when benchmarking. In particular, we advocate the use of time-dilation during ROS bag playback, or what we refer to as slo-mo playback. Using slo-mo to benchmark SLAM instantiations can provide clues to how SLAM implementations should be improved at the computational component level. Three prevalent VO/SLAM algorithms and two low-latency algorithms of our own are tested on selected typical sequences, which are generated from benchmark characterization, to further demonstrate the benefits achieved from computationally efficient components.

* 7 pages, 5 figures, accepted at ICRA 2019 Workshop on Dataset Generation and Benchmarking of SLAM Algorithms for Robotics and VR/AR

Via

Access Paper or Ask Questions

Good Feature Selection for Least Squares Pose Optimization in VO/VSLAM

May 19, 2019

Yipu Zhao, Patricio A. Vela

Figure 1 for Good Feature Selection for Least Squares Pose Optimization in VO/VSLAM

Figure 2 for Good Feature Selection for Least Squares Pose Optimization in VO/VSLAM

Figure 3 for Good Feature Selection for Least Squares Pose Optimization in VO/VSLAM

Figure 4 for Good Feature Selection for Least Squares Pose Optimization in VO/VSLAM

Abstract:This paper aims to select features that contribute most to the pose estimation in VO/VSLAM. Unlike existing feature selection works that are focused on efficiency only, our method significantly improves the accuracy of pose tracking, while introducing little overhead. By studying the impact of feature selection towards least squares pose optimization, we demonstrate the applicability of improving accuracy via good feature selection. To that end, we introduce the Max-logDet metric to guide the feature selection, which is connected to the conditioning of least squares pose optimization problem. We then describe an efficient algorithm for approximately solving the NP-hard Max-logDet problem. Integrating Max-logDet feature selection into a state-of-the-art visual SLAM system leads to accuracy improvements with low overhead, as demonstrated via evaluation on a public benchmark.

* 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1183-1189)
* 7 pages, 4 figures, published at IROS 2018

Via

Access Paper or Ask Questions