Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingyu Fan

NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language Model

Mar 12, 2025

Yuzhi Lai, Shenghai Yuan, Youssef Nassar, Mingyu Fan, Thomas Weber, Matthias Rätsch

Abstract:Effective Human-Robot Interaction (HRI) is crucial for future service robots in aging societies. Existing solutions are biased toward only well-trained objects, creating a gap when dealing with new objects. Currently, HRI systems using predefined gestures or language tokens for pretrained objects pose challenges for all individuals, especially elderly ones. These challenges include difficulties in recalling commands, memorizing hand gestures, and learning new names. This paper introduces NVP-HRI, an intuitive multi-modal HRI paradigm that combines voice commands and deictic posture. NVP-HRI utilizes the Segment Anything Model (SAM) to analyze visual cues and depth data, enabling precise structural object representation. Through a pre-trained SAM network, NVP-HRI allows interaction with new objects via zero-shot prediction, even without prior knowledge. NVP-HRI also integrates with a large language model (LLM) for multimodal commands, coordinating them with object selection and scene distribution in real time for collision-free trajectory solutions. We also regulate the action sequence with the essential control syntax to reduce LLM hallucination risks. The evaluation of diverse real-world tasks using a Universal Robot showcased up to 59.2\% efficiency improvement over traditional gesture control, as illustrated in the video https://youtu.be/EbC7al2wiAc. Our code and design will be openly available at https://github.com/laiyuzhi/NVP-HRI.git.

* This work has been accepted for publication in ESWA @ 2025 Elsevier. Personal use of this material is permitted. Permission from Elsevier must be obtained for all other uses, including reprinting/redistribution, creating new works, or reuse of any copyrighted components of this work in other media

Via

Access Paper or Ask Questions

NMM-HRI: Natural Multi-modal Human-Robot Interaction with Voice and Deictic Posture via Large Language Model

Jan 01, 2025

Yuzhi Lai, Shenghai Yuan, Youssef Nassar, Mingyu Fan, Atmaraaj Gopal, Arihiro Yorita, Naoyuki Kubota, Matthias Rätsch

Abstract:Translating human intent into robot commands is crucial for the future of service robots in an aging society. Existing Human-Robot Interaction (HRI) systems relying on gestures or verbal commands are impractical for the elderly due to difficulties with complex syntax or sign language. To address the challenge, this paper introduces a multi-modal interaction framework that combines voice and deictic posture information to create a more natural HRI system. The visual cues are first processed by the object detection model to gain a global understanding of the environment, and then bounding boxes are estimated based on depth information. By using a large language model (LLM) with voice-to-text commands and temporally aligned selected bounding boxes, robot action sequences can be generated, while key control syntax constraints are applied to avoid potential LLM hallucination issues. The system is evaluated on real-world tasks with varying levels of complexity using a Universal Robots UR3e manipulator. Our method demonstrates significantly better performance in HRI in terms of accuracy and robustness. To benefit the research community and the general public, we will make our code and design open-source.

* Submitted into RAM

Via

Access Paper or Ask Questions

Sliding Sequential CVAE with Time Variant Socially-aware Rethinking for Trajectory Prediction

Oct 28, 2021

Hao Zhou, Dongchun Ren, Xu Yang, Mingyu Fan, Hai Huang

Figure 1 for Sliding Sequential CVAE with Time Variant Socially-aware Rethinking for Trajectory Prediction

Figure 2 for Sliding Sequential CVAE with Time Variant Socially-aware Rethinking for Trajectory Prediction

Figure 3 for Sliding Sequential CVAE with Time Variant Socially-aware Rethinking for Trajectory Prediction

Figure 4 for Sliding Sequential CVAE with Time Variant Socially-aware Rethinking for Trajectory Prediction

Abstract:Pedestrian trajectory prediction is a key technology in many applications such as video surveillance, social robot navigation, and autonomous driving, and significant progress has been made in this research topic. However, there remain two limitations of previous studies. First, with the continuation of time, the prediction error at each time step increases significantly, causing the final displacement error to be impossible to ignore. Second, the prediction results of multiple pedestrians might be impractical in the prediction horizon, i.e., the predicted trajectories might collide with each other. To overcome these limitations, this work proposes a novel trajectory prediction method called CSR, which consists of a cascaded conditional variational autoencoder (CVAE) module and a socially-aware regression module. The cascaded CVAE module first estimates the future trajectories in a sequential pattern. Specifically, each CVAE concatenates the past trajectories and the predicted points so far as the input and predicts the location at the following time step. Then, the socially-aware regression module generates offsets from the estimated future trajectories to produce the socially compliant final predictions, which are more reasonable and accurate results than the estimated trajectories. Moreover, considering the large model parameters of the cascaded CVAE module, a slide CVAE module is further exploited to improve the model efficiency using one shared CVAE, in a slidable manner. Experiments results demonstrate that the proposed method exhibits improvements over state-of-the-art method on the Stanford Drone Dataset (SDD) and ETH/UCY of approximately 38.0% and 22.2%, respectively.

Via

Access Paper or Ask Questions

An Efficient Generation Method based on Dynamic Curvature of the Reference Curve for Robust Trajectory Planning

Dec 29, 2020

Yuchen Sun, Dongchun Ren, Shiqi Lian, Mingyu Fan, Xiangyi Teng

Figure 1 for An Efficient Generation Method based on Dynamic Curvature of the Reference Curve for Robust Trajectory Planning

Figure 2 for An Efficient Generation Method based on Dynamic Curvature of the Reference Curve for Robust Trajectory Planning

Figure 3 for An Efficient Generation Method based on Dynamic Curvature of the Reference Curve for Robust Trajectory Planning

Figure 4 for An Efficient Generation Method based on Dynamic Curvature of the Reference Curve for Robust Trajectory Planning

Abstract:Trajectory planning is a fundamental task on various autonomous driving platforms, such as social robotics and self-driving cars. Many trajectory planning algorithms use a reference curve based Frenet frame with time to reduce the planning dimension. However, there is a common implicit assumption in classic trajectory planning approaches, which is that the generated trajectory should follow the reference curve continuously. This assumption is not always true in real applications and it might cause some undesired issues in planning. One issue is that the projection of the planned trajectory onto the reference curve maybe discontinuous. Then, some segments on the reference curve are not the image of any part of the planned path. Another issue is that the planned path might self-intersect when following a simple reference curve continuously. The generated trajectories are unnatural and suboptimal ones when these issues happen. In this paper, we firstly demonstrate these issues and then introduce an efficient trajectory generation method which uses a new transformation from the Cartesian frame to Frenet frames. Experimental results on a simulated street scenario demonstrated the effectiveness of the proposed method.

* no comments

Via

Access Paper or Ask Questions

Robust Trajectory Forecasting for Multiple Intelligent Agents in Dynamic Scene

May 27, 2020

Yanliang Zhu, Dongchun Ren, Mingyu Fan, Deheng Qian, Xin Li, Huaxia Xia

Figure 1 for Robust Trajectory Forecasting for Multiple Intelligent Agents in Dynamic Scene

Figure 2 for Robust Trajectory Forecasting for Multiple Intelligent Agents in Dynamic Scene

Figure 3 for Robust Trajectory Forecasting for Multiple Intelligent Agents in Dynamic Scene

Figure 4 for Robust Trajectory Forecasting for Multiple Intelligent Agents in Dynamic Scene

Abstract:Trajectory forecasting, or trajectory prediction, of multiple interacting agents in dynamic scenes, is an important problem for many applications, such as robotic systems and autonomous driving. The problem is a great challenge because of the complex interactions among the agents and their interactions with the surrounding scenes. In this paper, we present a novel method for the robust trajectory forecasting of multiple intelligent agents in dynamic scenes. The proposed method consists of three major interrelated components: an interaction net for global spatiotemporal interactive feature extraction, an environment net for decoding dynamic scenes (i.e., the surrounding road topology of an agent), and a prediction net that combines the spatiotemporal feature, the scene feature, the past trajectories of agents and some random noise for the robust trajectory prediction of agents. Experiments on pedestrian-walking and vehicle-pedestrian heterogeneous datasets demonstrate that the proposed method outperforms the state-of-the-art prediction methods in terms of prediction accuracy.

Via

Access Paper or Ask Questions

Intrinsic dimension estimation of data by principal component analysis

Feb 10, 2010

Mingyu Fan, Nannan Gu, Hong Qiao, Bo Zhang

Figure 1 for Intrinsic dimension estimation of data by principal component analysis

Figure 2 for Intrinsic dimension estimation of data by principal component analysis

Figure 3 for Intrinsic dimension estimation of data by principal component analysis

Figure 4 for Intrinsic dimension estimation of data by principal component analysis

Abstract:Estimating intrinsic dimensionality of data is a classic problem in pattern recognition and statistics. Principal Component Analysis (PCA) is a powerful tool in discovering dimensionality of data sets with a linear structure; it, however, becomes ineffective when data have a nonlinear structure. In this paper, we propose a new PCA-based method to estimate intrinsic dimension of data with nonlinear structures. Our method works by first finding a minimal cover of the data set, then performing PCA locally on each subset in the cover and finally giving the estimation result by checking up the data variance on all small neighborhood regions. The proposed method utilizes the whole data set to estimate its intrinsic dimension and is convenient for incremental learning. In addition, our new PCA procedure can filter out noise in data and converge to a stable estimation with the neighborhood region size increasing. Experiments on synthetic and real world data sets show effectiveness of the proposed method.

* 8 pages, submitted for publication

Via

Access Paper or Ask Questions

Isometric Multi-Manifolds Learning

Dec 03, 2009

Mingyu Fan, Hong Qiao, Bo Zhang

Figure 1 for Isometric Multi-Manifolds Learning

Figure 2 for Isometric Multi-Manifolds Learning

Figure 3 for Isometric Multi-Manifolds Learning

Figure 4 for Isometric Multi-Manifolds Learning

Abstract:Isometric feature mapping (Isomap) is a promising manifold learning method. However, Isomap fails to work on data which distribute on clusters in a single manifold or manifolds. Many works have been done on extending Isomap to multi-manifolds learning. In this paper, we first proposed a new multi-manifolds learning algorithm (M-Isomap) with help of a general procedure. The new algorithm preserves intra-manifold geodesics and multiple inter-manifolds edges precisely. Compared with previous methods, this algorithm can isometrically learn data distributed on several manifolds. Secondly, the original multi-cluster manifold learning algorithm first proposed in \cite{DCIsomap} and called D-C Isomap has been revised so that the revised D-C Isomap can learn multi-manifolds data. Finally, the features and effectiveness of the proposed multi-manifolds learning algorithms are demonstrated and compared through experiments.

Via

Access Paper or Ask Questions