Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hanjing Ye

FlowPlan: Zero-Shot Task Planning with LLM Flow Engineering for Robotic Instruction Following

Mar 04, 2025

Zijun Lin, Chao Tang, Hanjing Ye, Hong Zhang

Abstract:Robotic instruction following tasks require seamless integration of visual perception, task planning, target localization, and motion execution. However, existing task planning methods for instruction following are either data-driven or underperform in zero-shot scenarios due to difficulties in grounding lengthy instructions into actionable plans under operational constraints. To address this, we propose FlowPlan, a structured multi-stage LLM workflow that elevates zero-shot pipeline and bridges the performance gap between zero-shot and data-driven in-context learning methods. By decomposing the planning process into modular stages--task information retrieval, language-level reasoning, symbolic-level planning, and logical evaluation--FlowPlan generates logically coherent action sequences while adhering to operational constraints and further extracts contextual guidance for precise instance-level target localization. Benchmarked on the ALFRED and validated in real-world applications, our method achieves competitive performance relative to data-driven in-context learning methods and demonstrates adaptability across diverse environments. This work advances zero-shot task planning in robotic systems without reliance on labeled data. Project website: https://instruction-following-project.github.io/.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

RPF-Search: Field-based Search for Robot Person Following in Unknown Dynamic Environments

Mar 04, 2025

Hanjing Ye, Kuanqi Cai, Yu Zhan, Bingyi Xia, Arash Ajoudani, Hong Zhang

Abstract:Autonomous robot person-following (RPF) systems are crucial for personal assistance and security but suffer from target loss due to occlusions in dynamic, unknown environments. Current methods rely on pre-built maps and assume static environments, limiting their effectiveness in real-world settings. There is a critical gap in re-finding targets under topographic (e.g., walls, corners) and dynamic (e.g., moving pedestrians) occlusions. In this paper, we propose a novel heuristic-guided search framework that dynamically builds environmental maps while following the target and resolves various occlusions by prioritizing high-probability areas for locating the target. For topographic occlusions, a belief-guided search field is constructed and used to evaluate the likelihood of the target's presence, while for dynamic occlusions, a fluid-field approach allows the robot to adaptively follow or overtake moving occluders. Past motion cues and environmental observations refine the search decision over time. Our results demonstrate that the proposed method outperforms existing approaches in terms of search efficiency and success rates, both in simulations and real-world tests. Our target search method enhances the adaptability and reliability of RPF systems in unknown and dynamic environments to support their use in real-world applications. Our code, video, experimental results and appendix are available at https://medlartea.github.io/rpf-search/.

* Under review

Via

Access Paper or Ask Questions

Monocular Person Localization under Camera Ego-motion

Mar 04, 2025

Yu Zhan, Hanjing Ye, Hong Zhang

Abstract:Localizing a person from a moving monocular camera is critical for Human-Robot Interaction (HRI). To estimate the 3D human position from a 2D image, existing methods either depend on the geometric assumption of a fixed camera or use a position regression model trained on datasets containing little camera ego-motion. These methods are vulnerable to fierce camera ego-motion, resulting in inaccurate person localization. We consider person localization as a part of a pose estimation problem. By representing a human with a four-point model, our method jointly estimates the 2D camera attitude and the person's 3D location through optimization. Evaluations on both public datasets and real robot experiments demonstrate our method outperforms baselines in person localization accuracy. Our method is further implemented into a person-following system and deployed on an agile quadruped robot.

* Under review

Via

Access Paper or Ask Questions

GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

Jul 16, 2024

Jingwen Yu, Hanjing Ye, Jianhao Jiao, Ping Tan, Hong Zhang

Figure 1 for GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

Figure 2 for GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

Figure 3 for GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

Figure 4 for GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

Abstract:Visual loop closure detection is an important module in visual simultaneous localization and mapping (SLAM), which associates current camera observation with previously visited places. Loop closures correct drifts in trajectory estimation to build a globally consistent map. However, a false loop closure can be fatal, so verification is required as an additional step to ensure robustness by rejecting the false positive loops. Geometric verification has been a well-acknowledged solution that leverages spatial clues provided by local feature matching to find true positives. Existing feature matching methods focus on homography and pose estimation in long-term visual localization, lacking references for geometric verification. To fill the gap, this paper proposes a unified benchmark targeting geometric verification of loop closure detection under long-term conditional variations. Furthermore, we evaluate six representative local feature matching methods (handcrafted and learning-based) under the benchmark, with in-depth analysis for limitations and future directions.

* 9 pages, 11 figures, Accepted by IROS(2024)

Via

Access Paper or Ask Questions

HPPS: A Hierarchical Progressive Perception System for Luggage Trolley Detection and Localization at Airports

May 09, 2024

Zhirui Sun, Zhe Zhang, Jieting Zhao, Hanjing Ye, Jiankun Wang

Figure 1 for HPPS: A Hierarchical Progressive Perception System for Luggage Trolley Detection and Localization at Airports

Figure 2 for HPPS: A Hierarchical Progressive Perception System for Luggage Trolley Detection and Localization at Airports

Figure 3 for HPPS: A Hierarchical Progressive Perception System for Luggage Trolley Detection and Localization at Airports

Figure 4 for HPPS: A Hierarchical Progressive Perception System for Luggage Trolley Detection and Localization at Airports

Abstract:The robotic autonomous luggage trolley collection system employs robots to gather and transport scattered luggage trolleys at airports. However, existing methods for detecting and locating these luggage trolleys often fail when they are not fully visible. To address this, we introduce the Hierarchical Progressive Perception System (HPPS), which enhances the detection and localization of luggage trolleys under partial occlusion. The HPPS processes the luggage trolley's position and orientation separately, which requires only RGB images for labeling and training, eliminating the need for 3D coordinates and alignment. The HPPS can accurately determine the position of the luggage trolley with just one well-detected keypoint and estimate the luggage trolley's orientation when it is partially occluded. Once the luggage trolley's initial pose is detected, HPPS updates this information continuously to refine its accuracy until the robot begins grasping. The experiments on detection and localization demonstrate that HPPS is more reliable under partial occlusion compared to existing methods. Its effectiveness and robustness have also been confirmed through practical tests in actual luggage trolley collection tasks. A website about this work is available at HPPS.

Via

Access Paper or Ask Questions

Human Orientation Estimation under Partial Observation

Apr 22, 2024

Jieting Zhao, Hanjing Ye, Yu Zhan, Hong Zhang

Abstract:Reliable human orientation estimation (HOE) is critical for autonomous agents to understand human intention and perform human-robot interaction (HRI) tasks. Great progress has been made in HOE under full observation. However, the existing methods easily make a wrong prediction under partial observation and give it an unexpectedly high probability. To solve the above problems, this study first develops a method that estimates orientation from the visible joints of a target person so that it is able to handle partial observation. Subsequently, we introduce a confidence-aware orientation estimation method, enabling more accurate orientation estimation and reasonable confidence estimation under partial observation. The effectiveness of our method is validated on both public and custom-built datasets, and it showed great accuracy and reliability improvement in partial observation scenarios. In particular, we show in real experiments that our method can benefit the robustness and consistency of the robot person following (RPF) task.

* Submitted to IROS 2024

Via

Access Paper or Ask Questions

Person Re-Identification for Robot Person Following with Online Continual Learning

Sep 21, 2023

Hanjing Ye, Jieting Zhao, Yu Zhan, Weinan Chen, Li He, Hong Zhang

Abstract:Robot person following (RPF) is a crucial capability in human-robot interaction (HRI) applications, allowing a robot to persistently follow a designated person. In practical RPF scenarios, the person often be occluded by other objects or people. Consequently, it is necessary to re-identify the person when he/she re-appears within the robot's field of view. Previous person re-identification (ReID) approaches to person following rely on offline-trained features and short-term experiences. Such an approach i) has a limited capacity to generalize across scenarios; and ii) often fails to re-identify the person when his re-appearance is out of the learned domain represented by the short-term experiences. Based on this observation, in this work, we propose a ReID framework for RPF that leverages long-term experiences. The experiences are maintained by a loss-guided keyframe selection strategy, to enable online continual learning of the appearance model. Our experiments demonstrate that even in the presence of severe appearance changes and distractions from visually similar people, the proposed method can still re-identify the person more accurately than the state-of-the-art methods.

* Under review

Via

Access Paper or Ask Questions

Robot Person Following Under Partial Occlusion

Feb 27, 2023

Hanjing Ye, Jieting Zhao, Yaling Pan, Weinan Chen, Li He, Hong Zhang

Figure 1 for Robot Person Following Under Partial Occlusion

Figure 2 for Robot Person Following Under Partial Occlusion

Figure 3 for Robot Person Following Under Partial Occlusion

Figure 4 for Robot Person Following Under Partial Occlusion

Abstract:Robot person following (RPF) is a capability that supports many useful human-robot-interaction (HRI) applications. However, existing solutions to person following often assume full observation of the tracked person. As a consequence, they cannot track the person reliably under partial occlusion where the assumption of full observation is not satisfied. In this paper, we focus on the problem of robot person following under partial occlusion caused by a limited field of view of a monocular camera. Based on the key insight that it is possible to locate the target person when one or more of his/her joints are visible, we propose a method in which each visible joint contributes a location estimate of the followed person. Experiments on a public person-following dataset show that, even under partial occlusion, the proposed method can still locate the person more reliably than the existing SOTA methods. As well, the application of our method is demonstrated in real experiments on a mobile robot.

* Accepted by ICRA 2023

Via

Access Paper or Ask Questions

Following Closely: A Robust Monocular Person Following System for Mobile Robot

Apr 25, 2022

Hanjing Ye, Jieting Zhao, Yaling Pan, Weinan Chen, Hong Zhang

Figure 1 for Following Closely: A Robust Monocular Person Following System for Mobile Robot

Figure 2 for Following Closely: A Robust Monocular Person Following System for Mobile Robot

Figure 3 for Following Closely: A Robust Monocular Person Following System for Mobile Robot

Figure 4 for Following Closely: A Robust Monocular Person Following System for Mobile Robot

Abstract:Monocular person following (MPF) is a capability that supports many useful applications of a mobile robot. However, existing MPF solutions are not completely satisfactory. Firstly, they often fail to track the target at a close distance either because they are based on a visual servo or they need the observation of the full body by the robot. Secondly, their target Re-IDentification (Re-ID) abilities are weak in cases of target appearance change and highly similar appearance of distracting people. To remove the assumption of full-body observation, we propose a width-based tracking module, which relies on the target width, which can be observed even at a close distance. For handling issues related to appearance variation, we use a global CNN (convolutional neural network) descriptor to represent the target and a ridge regression model to learn a target appearance model online. We adopt a sampling strategy for online classifier learning, in which both long-term and short-term samples are involved. We evaluate our method in two datasets including a public person following dataset and a custom-built one with challenging target appearance and target distance. Our method achieves state-of-the-art (SOTA) results on both datasets. For the benefit of the community, we make public the dataset and the source code.

* Under review in 2022 IEEE/RSJ International Conference on Intelligent Robotics and Systems (IROS 2022)

Via

Access Paper or Ask Questions

Mapping While Following: 2D LiDAR SLAM in Indoor Dynamic Environments with a Person Tracker

Apr 18, 2022

Hanjing Ye, Guangcheng Chen, Weinan Chen, Li He, Yisheng Guan, Hong Zhang

Figure 1 for Mapping While Following: 2D LiDAR SLAM in Indoor Dynamic Environments with a Person Tracker

Figure 2 for Mapping While Following: 2D LiDAR SLAM in Indoor Dynamic Environments with a Person Tracker

Figure 3 for Mapping While Following: 2D LiDAR SLAM in Indoor Dynamic Environments with a Person Tracker

Figure 4 for Mapping While Following: 2D LiDAR SLAM in Indoor Dynamic Environments with a Person Tracker

Abstract:2D LiDAR SLAM (Simultaneous Localization and Mapping) is widely used in indoor environments due to its stability and flexibility. However, its mapping procedure is usually operated by a joystick in static environments, while indoor environments often are dynamic with moving objects such as people. The generated map with noisy points due to the dynamic objects is usually incomplete and distorted. To address this problem, we propose a framework of 2D-LiDAR-based SLAM without manual control that effectively excludes dynamic objects (people) and simplify the process for a robot to map an environment. The framework, which includes three parts: people tracking, filtering and following. We verify our proposed framework in experiments with two classic 2D-LiDAR-based SLAM algorithms in indoor environments. The results show that this framework is effective in handling dynamic objects and reducing the mapping error.

* ROBIO, 2021, pp. 826-832
* Presented at 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO)

Via

Access Paper or Ask Questions