Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiangtao Gong

Bench2FreeAD: A Benchmark for Vision-based End-to-end Navigation in Unstructured Robotic Environments

Mar 15, 2025

Yuhang Peng, Sidong Wang, Jihaoyu Yang, Shilong Li, Han Wang, Jiangtao Gong

Abstract:Most current end-to-end (E2E) autonomous driving algorithms are built on standard vehicles in structured transportation scenarios, lacking exploration of robot navigation for unstructured scenarios such as auxiliary roads, campus roads, and indoor settings. This paper investigates E2E robot navigation in unstructured road environments. First, we introduce two data collection pipelines - one for real-world robot data and another for synthetic data generated using the Isaac Sim simulator, which together produce an unstructured robotics navigation dataset -- FreeWorld Dataset. Second, we fine-tuned an efficient E2E autonomous driving model -- VAD -- using our datasets to validate the performance and adaptability of E2E autonomous driving models in these environments. Results demonstrate that fine-tuning through our datasets significantly enhances the navigation potential of E2E autonomous driving models in unstructured robotic environments. Thus, this paper presents the first dataset targeting E2E robot navigation tasks in unstructured scenarios, and provides a benchmark based on vision-based E2E autonomous driving algorithms to facilitate the development of E2E navigation technology for logistics and service robots. The project is available on Github.

* 7 pages, 9 figures

Via

Access Paper or Ask Questions

A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation

Mar 07, 2025

Shanhe You, Xuewen Luo, Xinhe Liang, Jiashu Yu, Chen Zheng, Jiangtao Gong

Figure 1 for A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation

Figure 2 for A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation

Figure 3 for A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation

Abstract:Evaluation methods for autonomous driving are crucial for algorithm optimization. However, due to the complexity of driving intelligence, there is currently no comprehensive evaluation method for the level of autonomous driving intelligence. In this paper, we propose an evaluation framework for driving behavior intelligence in complex traffic environments, aiming to fill this gap. We constructed a natural language evaluation dataset of human professional drivers and passengers through naturalistic driving experiments and post-driving behavior evaluation interviews. Based on this dataset, we developed an LLM-powered driving evaluation framework. The effectiveness of this framework was validated through simulated experiments in the CARLA urban traffic simulator and further corroborated by human assessment. Our research provides valuable insights for evaluating and designing more intelligent, human-like autonomous driving agents. The implementation details of the framework and detailed information about the dataset can be found at Github.

* ICRA2025
* 8 pages, 3 figures

Via

Access Paper or Ask Questions

Large Language Models Powered Context-aware Motion Prediction

Mar 17, 2024

Xiaoji Zheng, Lixiu Wu, Zhijie Yan, Yuanrong Tang, Hao Zhao, Chen Zhong, Bokui Chen, Jiangtao Gong

Figure 1 for Large Language Models Powered Context-aware Motion Prediction

Figure 2 for Large Language Models Powered Context-aware Motion Prediction

Figure 3 for Large Language Models Powered Context-aware Motion Prediction

Figure 4 for Large Language Models Powered Context-aware Motion Prediction

Abstract:Motion prediction is among the most fundamental tasks in autonomous driving. Traditional methods of motion forecasting primarily encode vector information of maps and historical trajectory data of traffic participants, lacking a comprehensive understanding of overall traffic semantics, which in turn affects the performance of prediction tasks. In this paper, we utilized Large Language Models (LLMs) to enhance the global traffic context understanding for motion prediction tasks. We first conducted systematic prompt engineering, visualizing complex traffic environments and historical trajectory information of traffic participants into image prompts -- Transportation Context Map (TC-Map), accompanied by corresponding text prompts. Through this approach, we obtained rich traffic context information from the LLM. By integrating this information into the motion prediction model, we demonstrate that such context can enhance the accuracy of motion predictions. Furthermore, considering the cost associated with LLMs, we propose a cost-effective deployment strategy: enhancing the accuracy of motion prediction tasks at scale with 0.7\% LLM-augmented datasets. Our research offers valuable insights into enhancing the understanding of traffic scenes of LLMs and the motion prediction performance of autonomous driving.

* 6 pages,4 figures

Via

Access Paper or Ask Questions

Driving Style Alignment for LLM-powered Driver Agent

Mar 17, 2024

Ruoxuan Yang, Xinyue Zhang, Anais Fernandez-Laaksonen, Xin Ding, Jiangtao Gong

Abstract:Recently, LLM-powered driver agents have demonstrated considerable potential in the field of autonomous driving, showcasing human-like reasoning and decision-making abilities.However, current research on aligning driver agent behaviors with human driving styles remains limited, partly due to the scarcity of high-quality natural language data from human driving behaviors.To address this research gap, we propose a multi-alignment framework designed to align driver agents with human driving styles through demonstrations and feedback. Notably, we construct a natural language dataset of human driver behaviors through naturalistic driving experiments and post-driving interviews, offering high-quality human demonstrations for LLM alignment. The framework's effectiveness is validated through simulation experiments in the CARLA urban traffic simulator and further corroborated by human evaluations. Our research offers valuable insights into designing driving agents with diverse driving styles.The implementation of the framework and details of the dataset can be found at the link.

Via

Access Paper or Ask Questions

More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning

Feb 25, 2024

Zhipeng Ma, Zheyan Tu, Xinhai Chen, Yan Zhang, Deguo Xia, Guyue Zhou, Yilun Chen, Yu Zheng, Jiangtao Gong

Figure 1 for More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning

Figure 2 for More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning

Figure 3 for More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning

Figure 4 for More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning

Abstract:Trajectory representation learning plays a pivotal role in supporting various downstream tasks. Traditional methods in order to filter the noise in GPS trajectories tend to focus on routing-based methods used to simplify the trajectories. However, this approach ignores the motion details contained in the GPS data, limiting the representation capability of trajectory representation learning. To fill this gap, we propose a novel representation learning framework that Joint GPS and Route Modelling based on self-supervised technology, namely JGRM. We consider GPS trajectory and route as the two modes of a single movement observation and fuse information through inter-modal information interaction. Specifically, we develop two encoders, each tailored to capture representations of route and GPS trajectories respectively. The representations from the two modalities are fed into a shared transformer for inter-modal information interaction. Eventually, we design three self-supervised tasks to train the model. We validate the effectiveness of the proposed method on two real datasets based on extensive experiments. The experimental results demonstrate that JGRM outperforms existing methods in both road segment representation and trajectory representation tasks. Our source code is available at Anonymous Github.

Via

Access Paper or Ask Questions

Enable Natural Tactile Interaction for Robot Dog based on Large-format Distributed Flexible Pressure Sensors

Mar 14, 2023

Lishuang Zhan, Yancheng Cao, Qitai Chen, Haole Guo, Jiasi Gao, Yiyue Luo, Shihui Guo, Guyue Zhou, Jiangtao Gong

Figure 1 for Enable Natural Tactile Interaction for Robot Dog based on Large-format Distributed Flexible Pressure Sensors

Figure 2 for Enable Natural Tactile Interaction for Robot Dog based on Large-format Distributed Flexible Pressure Sensors

Figure 3 for Enable Natural Tactile Interaction for Robot Dog based on Large-format Distributed Flexible Pressure Sensors

Figure 4 for Enable Natural Tactile Interaction for Robot Dog based on Large-format Distributed Flexible Pressure Sensors

Abstract:Touch is an important channel for human-robot interaction, while it is challenging for robots to recognize human touch accurately and make appropriate responses. In this paper, we design and implement a set of large-format distributed flexible pressure sensors on a robot dog to enable natural human-robot tactile interaction. Through a heuristic study, we sorted out 81 tactile gestures commonly used when humans interact with real dogs and 44 dog reactions. A gesture classification algorithm based on ResNet is proposed to recognize these 81 human gestures, and the classification accuracy reaches 98.7%. In addition, an action prediction algorithm based on Transformer is proposed to predict dog actions from human gestures, reaching a 1-gram BLEU score of 0.87. Finally, we compare the tactile interaction with the voice interaction during a freedom human-robot-dog interactive playing study. The results show that tactile interaction plays a more significant role in alleviating user anxiety, stimulating user excitement and improving the acceptability of robot dogs.

* ICRA 2023
* 7 pages, 5 figures

Via

Access Paper or Ask Questions

"I am the follower, also the boss": Exploring Different Levels of Autonomy and Machine Forms of Guiding Robots for the Visually Impaired

Feb 07, 2023

Yan Zhang, Ziang Li, Haole Guo, Luyao Wang, Qihe Chen, Wenjie Jiang, Mingming Fan, Guyue Zhou, Jiangtao Gong

Figure 1 for "I am the follower, also the boss": Exploring Different Levels of Autonomy and Machine Forms of Guiding Robots for the Visually Impaired

Figure 2 for "I am the follower, also the boss": Exploring Different Levels of Autonomy and Machine Forms of Guiding Robots for the Visually Impaired

Figure 3 for "I am the follower, also the boss": Exploring Different Levels of Autonomy and Machine Forms of Guiding Robots for the Visually Impaired

Figure 4 for "I am the follower, also the boss": Exploring Different Levels of Autonomy and Machine Forms of Guiding Robots for the Visually Impaired

Abstract:Guiding robots, in the form of canes or cars, have recently been explored to assist blind and low vision (BLV) people. Such robots can provide full or partial autonomy when guiding. However, the pros and cons of different forms and autonomy for guiding robots remain unknown. We sought to fill this gap. We designed autonomy-switchable guiding robotic cane and car. We conducted a controlled lab-study (N=12) and a field study (N=9) on BLV. Results showed that full autonomy received better walking performance and subjective ratings in the controlled study, whereas participants used more partial autonomy in the natural environment as demanding more control. Besides, the car robot has demonstrated abilities to provide a higher sense of safety and navigation efficiency compared with the cane robot. Our findings offered empirical evidence about how the BLV community perceived different machine forms and autonomy, which can inform the design of assistive robots.

Via

Access Paper or Ask Questions

Can Quadruped Navigation Robots be Used as Guide Dogs?

Oct 18, 2022

Qihe Chen, Luyao Wang, Yan Zhang, Ziang Li, Tingmin Yan, Fan Wang, Guyue Zhou, Jiangtao Gong

Figure 1 for Can Quadruped Navigation Robots be Used as Guide Dogs?

Figure 2 for Can Quadruped Navigation Robots be Used as Guide Dogs?

Figure 3 for Can Quadruped Navigation Robots be Used as Guide Dogs?

Figure 4 for Can Quadruped Navigation Robots be Used as Guide Dogs?

Abstract:Bionic robots are generally considered to have strong flexibility, adaptability, and stability. Their bionic forms are more likely to interact emotionally with people, which means obvious advantages as socially assistive robots. However, it has not been widely concerned and verified in the blind and low-vision community. In this paper, we explored the guiding performance and experience of bionic quadruped robots compared to wheeled robots. We invited the visually impaired participants to complete a) the indoor straight & turn task and obstacle avoidance task in a laboratory environment; b) the outdoor real and complex environment. With the transition from indoor to outdoor, we found that the workload of the bionic quadruped robots changed to insignificant. Moreover, obvious temporal demand indoors changed to significant mental demand outdoors. Also, there was no significant advantage of quadruped robots in usability, trust, or satisfaction, which was amplified outdoors. We concluded that walking noise and the gait of quadruped robots would limit the guiding effect to a certain extent, and the empathetic effect of its zoomorphic form for visually impaired people could not be fully reflected. This paper provides evidence for the empirical research of bionic quadruped robots in the field of guiding VI people, pointing out their shortcomings in guiding performance and experience, and has good instructive value for the design of bionic guided robots in the future.

Via

Access Paper or Ask Questions

A High Fidelity Simulation Framework for Potential Safety Benefits Estimation of Cooperative Pedestrian Perception

Oct 18, 2022

Longrui Chen, Yan Zhang, Wenjie Jiang, Jiangtao Gong, Jiahao Shen, Mengdi Chu, Chuxuan Li, Yifeng Pan, Yifeng Shi, Nairui Luo(+4 more)

Figure 1 for A High Fidelity Simulation Framework for Potential Safety Benefits Estimation of Cooperative Pedestrian Perception

Figure 2 for A High Fidelity Simulation Framework for Potential Safety Benefits Estimation of Cooperative Pedestrian Perception

Figure 3 for A High Fidelity Simulation Framework for Potential Safety Benefits Estimation of Cooperative Pedestrian Perception

Figure 4 for A High Fidelity Simulation Framework for Potential Safety Benefits Estimation of Cooperative Pedestrian Perception

Abstract:This paper proposes a high-fidelity simulation framework that can estimate the potential safety benefits of vehicle-to-infrastructure (V2I) pedestrian safety strategies. This simulator can support cooperative perception algorithms in the loop by simulating the environmental conditions, traffic conditions, and pedestrian characteristics at the same time. Besides, the benefit estimation model applied in our framework can systematically quantify both the risk conflict (non-crash condition) and the severity of the pedestrian's injuries (crash condition). An experiment was conducted in this paper that built a digital twin of a crowded urban intersection in China. The result shows that our framework is efficient for safety benefit estimation of V2I pedestrian safety strategies.

Via

Access Paper or Ask Questions

Planning Assembly Sequence with Graph Transformer

Oct 12, 2022

Lin Ma, Jiangtao Gong, Hao Xu, Hao Chen, Hao Zhao, Wenbing Huang, Guyue Zhou

Figure 1 for Planning Assembly Sequence with Graph Transformer

Figure 2 for Planning Assembly Sequence with Graph Transformer

Figure 3 for Planning Assembly Sequence with Graph Transformer

Figure 4 for Planning Assembly Sequence with Graph Transformer

Abstract:Assembly sequence planning (ASP) is the essential process for modern manufacturing, proven to be NP-complete thus its effective and efficient solution has been a challenge for researchers in the field. In this paper, we present a graph-transformer based framework for the ASP problem which is trained and demonstrated on a self-collected ASP database. The ASP database contains a self-collected set of LEGO models. The LEGO model is abstracted to a heterogeneous graph structure after a thorough analysis of the original structure and feature extraction. The ground truth assembly sequence is first generated by brute-force search and then adjusted manually to in line with human rational habits. Based on this self-collected ASP dataset, we propose a heterogeneous graph-transformer framework to learn the latent rules for assembly planning. We evaluated the proposed framework in a series of experiment. The results show that the similarity of the predicted and ground truth sequences can reach 0.44, a medium correlation measured by Kendall's $\tau$. Meanwhile, we compared the different effects of node features and edge features and generated a feasible and reasonable assembly sequence as a benchmark for further research. Our data set and code is available on https://github.com/AIR-DISCOVER/ICRA\_ASP.

* Submitted to ICRA2023

Via

Access Paper or Ask Questions