Abstract:The development of general robotic systems capable of manipulating in unstructured environments is a significant challenge. While Vision-Language Models(VLM) excel in high-level commonsense reasoning, they lack the fine-grained 3D spatial understanding required for precise manipulation tasks. Fine-tuning VLM on robotic datasets to create Vision-Language-Action Models(VLA) is a potential solution, but it is hindered by high data collection costs and generalization issues. To address these challenges, we propose a novel object-centric representation that bridges the gap between VLM's high-level reasoning and the low-level precision required for manipulation. Our key insight is that an object's canonical space, defined by its functional affordances, provides a structured and semantically meaningful way to describe interaction primitives, such as points and directions. These primitives act as a bridge, translating VLM's commonsense reasoning into actionable 3D spatial constraints. In this context, we introduce a dual closed-loop, open-vocabulary robotic manipulation system: one loop for high-level planning through primitive resampling, interaction rendering and VLM checking, and another for low-level execution via 6D pose tracking. This design ensures robust, real-time control without requiring VLM fine-tuning. Extensive experiments demonstrate strong zero-shot generalization across diverse robotic manipulation tasks, highlighting the potential of this approach for automating large-scale simulation data generation.
Abstract:Autonomous exploration is one of the important parts to achieve the fast autonomous mapping and target search. However, most of the existing methods are facing low-efficiency problems caused by low-quality trajectory or back-and-forth maneuvers. To improve the exploration efficiency in unknown environments, a fast autonomous exploration planner (FAEP) is proposed in this paper. Different from existing methods, we firstly design a novel frontiers exploration sequence generation method to obtain a more reasonable exploration path, which considers not only the flight-level but frontier-level factors in the asymmetric traveling salesman problem (ATSP). Then, according to the exploration sequence and the distribution of frontiers, an adaptive yaw planning method is proposed to cover more frontiers by yaw change during an exploration journey. In addition, to increase the speed and fluency of flight, a dynamic replanning strategy is also adopted. We present sufficient comparison and evaluation experiments in simulation environments. Experimental results show the proposed exploration planner has better performance in terms of flight time and flight distance compared to typical and state-of-the-art methods. Moreover, the effectiveness of the proposed method is further evaluated in real-world environments.
Abstract:In the peg insertion task, human pays attention to the seam between the peg and the hole and tries to fill it continuously with visual feedback. By imitating the human behavior, we design architectures with position and orientation estimators based on the seam representation for pose alignment, which proves to be general to the unseen peg geometries. By putting the estimators into the closed-loop control with reinforcement learning, we further achieve a higher or comparable success rate, efficiency, and robustness compared with the baseline methods. The policy is trained totally in simulation without any manual intervention. To achieve sim-to-real, a learnable segmentation module with automatic data collecting and labeling can be easily trained to decouple the perception and the policy, which helps the model trained in simulation quickly adapt to the real world with negligible effort. Results are presented in simulation and on a physical robot. Code, videos, and supplemental material are available at https://github.com/xieliang555/SFN.git
Abstract:Autonomous exploration is one of the important parts to achieve the autonomous operation of Unmanned Aerial Vehicles (UAVs). To improve the efficiency of the exploration process, a fast and autonomous exploration planner (FAEP) is proposed in this paper. We firstly design a novel frontiers exploration sequence generation method to obtain a more reasonable exploration path, which considers not only the flight-level but frontier-level factors into TSP. According to the exploration sequence and the distribution of frontiers, a two-stage heading planning strategy is proposed to cover more frontiers by heading change during an exploration journey. To improve the stability of path searching, a guided kinodynamic path searching based on a guiding path is devised. In addition, a dynamic start point selection method for replanning is also adopted to increase the fluency of flight. We present sufficient benchmark and real-world experiments. Experimental results show the superiority of the proposed exploration planner compared with typical and state-of-the-art methods.