Abstract:In crowd navigation, the local goal plays a crucial role in trajectory initialization, optimization, and evaluation. Recognizing that when the global goal is distant, the robot's primary objective is avoiding collisions, making it less critical to pass through the exact local goal point, this work introduces the concept of goal lines, which extend the traditional local goal from a single point to multiple candidate lines. Coupled with a topological map construction strategy that groups obstacles to be as convex as possible, a goal-adaptive navigation framework is proposed to efficiently plan multiple candidate trajectories. Simulations and experiments demonstrate that the proposed GA-TEB framework effectively prevents deadlock situations, where the robot becomes frozen due to a lack of feasible trajectories in crowded environments. Additionally, the framework greatly increases planning frequency in scenarios with numerous non-convex obstacles, enhancing both robustness and safety.
Abstract:6D object pose estimation holds essential roles in various fields, particularly in the grasping of industrial workpieces. Given challenges like rust, high reflectivity, and absent textures, this paper introduces a point cloud based pose estimation framework (PS6D). PS6D centers on slender and multi-symmetric objects. It extracts multi-scale features through an attention-guided feature extraction module, designs a symmetry-aware rotation loss and a center distance sensitive translation loss to regress the pose of each point to the centroid of the instance, and then uses a two-stage clustering method to complete instance segmentation and pose estimation. Objects from the Sil\'eane and IPA datasets and typical workpieces from industrial practice are used to generate data and evaluate the algorithm. In comparison to the state-of-the-art approach, PS6D demonstrates an 11.5\% improvement in F$_{1_{inst}}$ and a 14.8\% improvement in Recall. The main part of PS6D has been deployed to the software of Mech-Mind, and achieves a 91.7\% success rate in bin-picking experiments, marking its application in industrial pose estimation tasks.
Abstract:People's looking at each other or mutual gaze is ubiquitous in our daily interactions, and detecting mutual gaze is of great significance for understanding human social scenes. Current mutual gaze detection methods focus on two-stage methods, whose inference speed is limited by the two-stage pipeline and the performance in the second stage is affected by the first one. In this paper, we propose a novel one-stage mutual gaze detection framework called Mutual Gaze TRansformer or MGTR to perform mutual gaze detection in an end-to-end manner. By designing mutual gaze instance triples, MGTR can detect each human head bounding box and simultaneously infer mutual gaze relationship based on global image information, which streamlines the whole process with simplicity. Experimental results on two mutual gaze datasets show that our method is able to accelerate mutual gaze detection process without losing performance. Ablation study shows that different components of MGTR can capture different levels of semantic information in images. Code is available at https://github.com/Gmbition/MGTR