Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianhao Zheng

WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments

Apr 04, 2025

Jianhao Zheng, Zihan Zhu, Valentin Bieri, Marc Pollefeys, Songyou Peng, Iro Armeni

Abstract:We present WildGS-SLAM, a robust and efficient monocular RGB SLAM system designed to handle dynamic environments by leveraging uncertainty-aware geometric mapping. Unlike traditional SLAM systems, which assume static scenes, our approach integrates depth and uncertainty information to enhance tracking, mapping, and rendering performance in the presence of moving objects. We introduce an uncertainty map, predicted by a shallow multi-layer perceptron and DINOv2 features, to guide dynamic object removal during both tracking and mapping. This uncertainty map enhances dense bundle adjustment and Gaussian map optimization, improving reconstruction accuracy. Our system is evaluated on multiple datasets and demonstrates artifact-free view synthesis. Results showcase WildGS-SLAM's superior performance in dynamic environments compared to state-of-the-art methods.

Via

Access Paper or Ask Questions

MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

Jun 09, 2024

Jianhao Zheng, Daniel Barath, Marc Pollefeys, Iro Armeni

Figure 1 for MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

Figure 2 for MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

Figure 3 for MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

Figure 4 for MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

Abstract:Creating 3D semantic reconstructions of environments is fundamental to many applications, especially when related to autonomous agent operation (e.g., goal-oriented navigation or object interaction and manipulation). Commonly, 3D semantic reconstruction systems capture the entire scene in the same level of detail. However, certain tasks (e.g., object interaction) require a fine-grained and high-resolution map, particularly if the objects to interact are of small size or intricate geometry. In recent practice, this leads to the entire map being in the same high-quality resolution, which results in increased computational and storage costs. To address this challenge, we propose MAP-ADAPT, a real-time method for quality-adaptive semantic 3D reconstruction using RGBD frames. MAP-ADAPT is the first adaptive semantic 3D mapping algorithm that, unlike prior work, generates directly a single map with regions of different quality based on both the semantic information and the geometric complexity of the scene. Leveraging a semantic SLAM pipeline for pose and semantic estimation, we achieve comparable or superior results to state-of-the-art methods on synthetic and real-world data, while significantly reducing storage and computation requirements.

Via

Access Paper or Ask Questions

TEAM: a parameter-free algorithm to teach collaborative robots motions from user demonstrations

Sep 14, 2022

Lorenzo Panchetti, Jianhao Zheng, Mohamed Bouri, Malcolm Mielle

Figure 1 for TEAM: a parameter-free algorithm to teach collaborative robots motions from user demonstrations

Figure 2 for TEAM: a parameter-free algorithm to teach collaborative robots motions from user demonstrations

Figure 3 for TEAM: a parameter-free algorithm to teach collaborative robots motions from user demonstrations

Figure 4 for TEAM: a parameter-free algorithm to teach collaborative robots motions from user demonstrations

Abstract:Collaborative robots (cobots) built to work alongside humans must be able to quickly learn new skills and adapt to new task configurations. Learning from demonstration (LfD) enables cobots to learn and adapt motions to different use conditions. However, state-of-the-art LfD methods require manually tuning intrinsic parameters and have rarely been used in industrial contexts without experts. In this paper, the development and implementation of a LfD framework for industrial applications with naive users is presented. We propose a parameter-free method based on probabilistic movement primitives, where all the parameters are pre-determined using Jensen-Shannon divergence and bayesian optimization; thus, users do not have to perform manual parameter tuning. This method learns motions from a small dataset of user demonstrations, and generalizes the motion to various scenarios and conditions. We evaluate the method extensively in two field tests: one where the cobot works on elevator door maintenance, and one where three Schindler workers teach the cobot tasks useful for their workflow. Errors between the cobot end-effector and target positions range from $0$ to $1.48\pm0.35$mm. For all tests, no task failures were reported. Questionnaires completed by the Schindler workers highlighted the method's ease of use, feeling of safety, and the accuracy of the reproduced motion. Our code and recorded trajectories are made available online for reproduction.

* 7 pages, 6 figures, submitted to ICRA 2023

Via

Access Paper or Ask Questions

CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data

Dec 16, 2021

Qi Yan, Jianhao Zheng, Simon Reding, Shanci Li, Iordan Doytchinov

Figure 1 for CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data

Figure 2 for CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data

Figure 3 for CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data

Figure 4 for CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data

Abstract:We present a visual localization system that learns to estimate camera poses in the real world with the help of synthetic data. Despite significant progress in recent years, most learning-based approaches to visual localization target at a single domain and require a dense database of geo-tagged images to function well. To mitigate the data scarcity issue and improve the scalability of the neural localization models, we introduce TOPO-DataGen, a versatile synthetic data generation tool that traverses smoothly between the real and virtual world, hinged on the geographic camera viewpoint. New large-scale sim-to-real benchmark datasets are proposed to showcase and evaluate the utility of the said synthetic data. Our experiments reveal that synthetic data generically enhances the neural network performance on real data. Furthermore, we introduce CrossLoc, a cross-modal visual representation learning approach to pose estimation that makes full use of the scene coordinate ground truth via self-supervision. Without any extra data, CrossLoc significantly outperforms the state-of-the-art methods and achieves substantially higher real-data sample efficiency. Our code is available at https://github.com/TOPO-EPFL/CrossLoc.

* Preprint. Our code is available at https://github.com/TOPO-EPFL/CrossLoc

Via

Access Paper or Ask Questions