Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shunbo Zhou

Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning

Jun 06, 2025

Yuheng Lei, Sitong Mao, Shunbo Zhou, Hongyuan Zhang, Xuelong Li, Ping Luo

Abstract:A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting. Previous work within the dominant pretrain-then-finetune paradigm has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters. However, in the context of lifelong learning, these methods rely on the impractical assumption of a test-time task identifier and restrict knowledge sharing among isolated adapters. To address these limitations, we propose Dynamic Mixture of Progressive Parameter-Efficient Expert Library (DMPEL) for lifelong robot learning. DMPEL progressively learn a low-rank expert library and employs a lightweight router to dynamically combine experts into an end-to-end policy, facilitating flexible behavior during lifelong adaptation. Moreover, by leveraging the modular structure of the fine-tuned parameters, we introduce coefficient replay to guide the router in accurately retrieving frozen experts for previously encountered tasks, thereby mitigating catastrophic forgetting. This method is significantly more storage- and computationally-efficient than applying demonstration replay to the entire policy. Extensive experiments on the lifelong manipulation benchmark LIBERO demonstrate that our framework outperforms state-of-the-art lifelong learning methods in success rates across continual adaptation, while utilizing minimal trainable parameters and storage.

Via

Access Paper or Ask Questions

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering

Mar 20, 2025

Shiyong Liu, Xiao Tang, Zhihao Li, Yingfan He, Chongjie Ye, Jianzhuang Liu, Binxiao Huang, Shunbo Zhou, Xiaofei Wu

Abstract:In large-scale scene reconstruction using 3D Gaussian splatting, it is common to partition the scene into multiple smaller regions and reconstruct them individually. However, existing division methods are occlusion-agnostic, meaning that each region may contain areas with severe occlusions. As a result, the cameras within those regions are less correlated, leading to a low average contribution to the overall reconstruction. In this paper, we propose an occlusion-aware scene division strategy that clusters training cameras based on their positions and co-visibilities to acquire multiple regions. Cameras in such regions exhibit stronger correlations and a higher average contribution, facilitating high-quality scene reconstruction. We further propose a region-based rendering technique to accelerate large scene rendering, which culls Gaussians invisible to the region where the viewpoint is located. Such a technique significantly speeds up the rendering without compromising quality. Extensive experiments on multiple large scenes show that our method achieves superior reconstruction results with faster rendering speed compared to existing state-of-the-art approaches. Project page: https://occlugaussian.github.io.

* Project website: https://occlugaussian.github.io

Via

Access Paper or Ask Questions

Scale Disparity of Instances in Interactive Point Cloud Segmentation

Jul 19, 2024

Chenrui Han, Xuan Yu, Yuxuan Xie, Yili Liu, Sitong Mao, Shunbo Zhou, Rong Xiong, Yue Wang

Abstract:Interactive point cloud segmentation has become a pivotal task for understanding 3D scenes, enabling users to guide segmentation models with simple interactions such as clicks, therefore significantly reducing the effort required to tailor models to diverse scenarios and new categories. However, in the realm of interactive segmentation, the meaning of instance diverges from that in instance segmentation, because users might desire to segment instances of both thing and stuff categories that vary greatly in scale. Existing methods have focused on thing categories, neglecting the segmentation of stuff categories and the difficulties arising from scale disparity. To bridge this gap, we propose ClickFormer, an innovative interactive point cloud segmentation model that accurately segments instances of both thing and stuff categories. We propose a query augmentation module to augment click queries by a global query sampling strategy, thus maintaining consistent performance across different instance scales. Additionally, we employ global attention in the query-voxel transformer to mitigate the risk of generating false positives, along with several other network structure improvements to further enhance the model's segmentation performance. Experiments demonstrate that ClickFormer outperforms existing interactive point cloud segmentation methods across both indoor and outdoor datasets, providing more accurate segmentation results with fewer user clicks in an open-world setting.

* Accepted by 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems

Via

Access Paper or Ask Questions

Preserving Full Degradation Details for Blind Image Super-Resolution

Jul 02, 2024

Hongda Liu, Longguang Wang, Ye Zhang, Kaiwen Xue, Shunbo Zhou, Yulan Guo

Figure 1 for Preserving Full Degradation Details for Blind Image Super-Resolution

Figure 2 for Preserving Full Degradation Details for Blind Image Super-Resolution

Figure 3 for Preserving Full Degradation Details for Blind Image Super-Resolution

Figure 4 for Preserving Full Degradation Details for Blind Image Super-Resolution

Abstract:The performance of image super-resolution relies heavily on the accuracy of degradation information, especially under blind settings. Due to absence of true degradation models in real-world scenarios, previous methods learn distinct representations by distinguishing different degradations in a batch. However, the most significant degradation differences may provide shortcuts for the learning of representations such that subtle difference may be discarded. In this paper, we propose an alternative to learn degradation representations through reproducing degraded low-resolution (LR) images. By guiding the degrader to reconstruct input LR images, full degradation information can be encoded into the representations. In addition, we develop an energy distance loss to facilitate the learning of the degradation representations by introducing a bounded constraint. Experiments show that our representations can extract accurate and highly robust degradation information. Moreover, evaluations on both synthetic and real images demonstrate that our ReDSR achieves state-of-the-art performance for the blind SR tasks.

* 18 pages, 11 figures, 4 tables

Via

Access Paper or Ask Questions

PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction

Jul 01, 2024

Xuan Yu, Yili Liu, Chenrui Han, Sitong Mao, Shunbo Zhou, Rong Xiong, Yiyi Liao, Yue Wang

Abstract:Panoptic reconstruction is a challenging task in 3D scene understanding. However, most existing methods heavily rely on pre-trained semantic segmentation models and known 3D object bounding boxes for 3D panoptic segmentation, which is not available for in-the-wild scenes. In this paper, we propose a novel zero-shot panoptic reconstruction method from RGB-D images of scenes. For zero-shot segmentation, we leverage open-vocabulary instance segmentation, but it has to face partial labeling and instance association challenges. We tackle both challenges by propagating partial labels with the aid of dense generalized features and building a 3D instance graph for associating 2D instance IDs. Specifically, we exploit partial labels to learn a classifier for generalized semantic features to provide complete labels for scenes with dense distilled features. Moreover, we formulate instance association as a 3D instance graph segmentation problem, allowing us to fully utilize the scene geometry prior and all 2D instance masks to infer global unique pseudo 3D instance ID. Our method outperforms state-of-the-art methods on the indoor dataset ScanNet V2 and the outdoor dataset KITTI-360, demonstrating the effectiveness of our graph segmentation method and reconstruction network.

Via

Access Paper or Ask Questions

SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation

Apr 16, 2024

Chang Chen, Yuecheng Liu, Yuzheng Zhuang, Sitong Mao, Kaiwen Xue, Shunbo Zhou

Abstract:Although visual navigation has been extensively studied using deep reinforcement learning, online learning for real-world robots remains a challenging task. Recent work directly learned from offline dataset to achieve broader generalization in the real-world tasks, which, however, faces the out-of-distribution (OOD) issue and potential robot localization failures in a given map for unseen observation. This significantly drops the success rates and even induces collision. In this paper, we present a self-correcting visual navigation method, SCALE, that can autonomously prevent the robot from the OOD situations without human intervention. Specifically, we develop an image-goal conditioned offline reinforcement learning method based on implicit Q-learning (IQL). When facing OOD observation, our novel localization recovery method generates the potential future trajectories by learning from the navigation affordance, and estimates the future novelty via random network distillation (RND). A tailored cost function searches for the candidates with the least novelty that can lead the robot to the familiar places. We collect offline data and conduct evaluation experiments in three real-world urban scenarios. Experiment results show that SCALE outperforms the previous state-of-the-art methods for open-world navigation with a unique capability of localization recovery, significantly reducing the need for human intervention. Code is available at https://github.com/KubeEdge4Robotics/ScaleNav.

* 7 pages, 5 figures, 2024 IEEE International Conference on Robotics and Automation

Via

Access Paper or Ask Questions

NF-Atlas: Multi-Volume Neural Feature Fields for Large Scale LiDAR Mapping

Apr 10, 2023

Xuan Yu, Yili Liu, Sitong Mao, Shunbo Zhou, Rong Xiong, Yiyi Liao, Yue Wang

Abstract:LiDAR Mapping has been a long-standing problem in robotics. Recent progress in neural implicit representation has brought new opportunities to robotic mapping. In this paper, we propose the multi-volume neural feature fields, called NF-Atlas, which bridge the neural feature volumes with pose graph optimization. By regarding the neural feature volume as pose graph nodes and the relative pose between volumes as pose graph edges, the entire neural feature field becomes both locally rigid and globally elastic. Locally, the neural feature volume employs a sparse feature Octree and a small MLP to encode the submap SDF with an option of semantics. Learning the map using this structure allows for end-to-end solving of maximum a posteriori (MAP) based probabilistic mapping. Globally, the map is built volume by volume independently, avoiding catastrophic forgetting when mapping incrementally. Furthermore, when a loop closure occurs, with the elastic pose graph based representation, only updating the origin of neural volumes is required without remapping. Finally, these functionalities of NF-Atlas are validated. Thanks to the sparsity and the optimization based formulation, NF-Atlas shows competitive performance in terms of accuracy, efficiency and memory usage on both simulation and real-world datasets.

Via

Access Paper or Ask Questions

Loop-Closure Detection Based on 3D Point Cloud Learning for Self-Driving Industry Vehicles

Apr 30, 2019

Zhe Liu, Chuanzhe Suo, Shunbo Zhou, Wen Chen, Hesheng Wang, Yun-Hui Liu

Figure 1 for Loop-Closure Detection Based on 3D Point Cloud Learning for Self-Driving Industry Vehicles

Figure 2 for Loop-Closure Detection Based on 3D Point Cloud Learning for Self-Driving Industry Vehicles

Figure 3 for Loop-Closure Detection Based on 3D Point Cloud Learning for Self-Driving Industry Vehicles

Figure 4 for Loop-Closure Detection Based on 3D Point Cloud Learning for Self-Driving Industry Vehicles

Abstract:Self-driving industry vehicle plays a key role in the industry automation and contributes to resolve the problems of the shortage and increasing cost in manpower. Place recognition and loop-closure detection are main challenges in the localization and navigation tasks, specially when industry vehicles work in large-scale complex environments, such as the logistics warehouse and the port terminal. In this paper, we resolve the loop-closure detection problem by developing a novel 3D point cloud learning network, an active super keyframe selection method and a coarse-to-fine sequence matching strategy. More specifically, we first propose a novel deep neural network to extract a global descriptors from the original large-scale 3D point cloud, then based on which, an environment analysis approach is presented to investigate the feature space distribution of the global descriptors and actively select several super keyframes. Finally, a coarse-to-fine sequence matching strategy, which includes a super keyframe based coarse matching stage and a local sequence matching stage, is presented to ensure the loop-closure detection accuracy and real-time performance simultaneously. The proposed network is evaluated in different datasets and obtains a substantial improvement against the state-of-the-art PointNetVLAD in place recognition tasks. Experiment results on a self-driving industry vehicle validate the effectiveness of the proposed loop-closure detection algorithm.

* 8 pages, 9 figures

Via

Access Paper or Ask Questions

Coordinating Large-Scale Robot Networks with Motion and Communication Uncertainties for Logistics Applications

Apr 02, 2019

Zhe Liu, Hesheng Wang, Shunbo Zhou, Yi Shen, Yun-Hui Liu

Figure 1 for Coordinating Large-Scale Robot Networks with Motion and Communication Uncertainties for Logistics Applications

Figure 2 for Coordinating Large-Scale Robot Networks with Motion and Communication Uncertainties for Logistics Applications

Figure 3 for Coordinating Large-Scale Robot Networks with Motion and Communication Uncertainties for Logistics Applications

Figure 4 for Coordinating Large-Scale Robot Networks with Motion and Communication Uncertainties for Logistics Applications

Abstract:In this paper, we focus on the problem of task allocation, cooperative path planning and motion coordination of the large-scale system with thousands of robots, aiming for practical applications in robotic warehouses and automated logistics systems. Particularly, we solve the life-long planning problem and guarantee the coordination performance of large-scale robot network in the presence of robot motion uncertainties and communication failures. A hierarchical planning and coordination structure is presented. The environment is divided into several sectors and a dynamic traffic heat-map is generated to describe the current sector-level traffic flow. In task planning level, a greedy task allocation method is implemented to assign the current task to the nearest free robot and the sector-level path is generated by comprehensively considering the traveling distance, the traffic heat-value distribution and the current robot/communication failures. In motion coordination level, local cooperative A* algorithm is implemented in each sector to generate the collision-free road-level path of each robot in the sector and the rolling planning structure is introduced to solve problems caused by motion and communication uncertainties. The effectiveness and practical applicability of the proposed approach are validated by large-scale simulations with more than one thousand robots and real laboratory experiments.

* A preliminary version of this work has been accepted by ICRA 2019

Via

Access Paper or Ask Questions

3D Point Cloud Learning for Large-scale Environment Analysis and Place Recognition

Dec 11, 2018

Zhe Liu, Shunbo Zhou, Chuanzhe Suo, Yingtian Liu, Hesheng Wang, Yun-Hui Liu

Figure 1 for 3D Point Cloud Learning for Large-scale Environment Analysis and Place Recognition

Figure 2 for 3D Point Cloud Learning for Large-scale Environment Analysis and Place Recognition

Figure 3 for 3D Point Cloud Learning for Large-scale Environment Analysis and Place Recognition

Figure 4 for 3D Point Cloud Learning for Large-scale Environment Analysis and Place Recognition

Abstract:In this paper, we develop a new deep neural network which can extract discriminative and generalizable global descriptors from the raw 3D point cloud. Specifically, two novel modules, Adaptive Local Feature Extraction and Graph-based Neighborhood Aggregation, are designed and integrated into our network. This contributes to extract the local features adequately, reveal the spatial distribution of the point cloud, and find out the local structure and neighborhood relations of each part in a large-scale point cloud with an end-to-end manner. Furthermore, we utilize the network output for point cloud based analysis and retrieval tasks to achieve large-scale place recognition and environmental analysis. We tested our approach on the Oxford RobotCar dataset. The results for place recognition increased the existing state-of-the-art result (PointNetVLAD) from 81.01% to 94.92%. Moreover, we present an application to analyze the large-scale environment by evaluating the uniqueness of each location in the map, which can be applied to localization and loop-closure tasks, which are crucial for robotics and self-driving applications.

Via

Access Paper or Ask Questions