Abstract:Camera-based 3D semantic scene completion (SSC) provides dense geometric and semantic perception for autonomous driving. However, images provide limited information making the model susceptible to geometric ambiguity caused by occlusion and perspective distortion. Existing methods often lack explicit semantic modeling between objects, limiting their perception of 3D semantic context. To address these challenges, we propose a novel method VLScene: Vision-Language Guidance Distillation for Camera-based 3D Semantic Scene Completion. The key insight is to use the vision-language model to introduce high-level semantic priors to provide the object spatial context required for 3D scene understanding. Specifically, we design a vision-language guidance distillation process to enhance image features, which can effectively capture semantic knowledge from the surrounding environment and improve spatial context reasoning. In addition, we introduce a geometric-semantic sparse awareness mechanism to propagate geometric structures in the neighborhood and enhance semantic information through contextual sparse interactions. Experimental results demonstrate that VLScene achieves rank-1st performance on challenging benchmarks--SemanticKITTI and SSCBench-KITTI-360, yielding remarkably mIoU scores of 17.52 and 19.10, respectively.
Abstract:With the advancements in modern intelligent technologies, mobile robots equipped with manipulators are increasingly operating in unstructured environments. These robots can plan sequences of actions for long-horizon tasks based on perceived information. However, in practice, the planned actions often fail due to discrepancies between the perceptual information used for planning and the actual conditions. In this paper, we introduce the {\itshape Conditional Subtree} (CSubBT), a general self-adjusting execution framework for mobile manipulation tasks based on Behavior Trees (BTs). CSubBT decomposes symbolic action into sub-actions and uses BTs to control their execution, addressing any potential anomalies during the process. CSubBT treats common anomalies as constraint non-satisfaction problems and continuously guides the robot in performing tasks by sampling new action parameters in the constraint space when anomalies are detected. We demonstrate the robustness of our framework through extensive manipulation experiments on different platforms, both in simulation and real-world settings.