Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hang Qiu

Towards Natural Language Communication for Cooperative Autonomous Driving via Self-Play

May 23, 2025

Jiaxun Cui, Chen Tang, Jarrett Holtz, Janice Nguyen, Alessandro G. Allievi, Hang Qiu, Peter Stone

Abstract:Past work has demonstrated that autonomous vehicles can drive more safely if they communicate with one another than if they do not. However, their communication has often not been human-understandable. Using natural language as a vehicle-to-vehicle (V2V) communication protocol offers the potential for autonomous vehicles to drive cooperatively not only with each other but also with human drivers. In this work, we propose a suite of traffic tasks in autonomous driving where vehicles in a traffic scenario need to communicate in natural language to facilitate coordination in order to avoid an imminent collision and/or support efficient traffic flow. To this end, this paper introduces a novel method, LLM+Debrief, to learn a message generation and high-level decision-making policy for autonomous vehicles through multi-agent discussion. To evaluate LLM agents for driving, we developed a gym-like simulation environment that contains a range of driving scenarios. Our experimental results demonstrate that LLM+Debrief is more effective at generating meaningful and human-understandable natural language messages to facilitate cooperation and coordination than a zero-shot LLM agent. Our code and demo videos are available at https://talking-vehicles.github.io/.

Via

Access Paper or Ask Questions

FluidML: Fast and Memory Efficient Inference Optimization

Nov 14, 2024

Jinjie Liu, Hang Qiu

Figure 1 for FluidML: Fast and Memory Efficient Inference Optimization

Figure 2 for FluidML: Fast and Memory Efficient Inference Optimization

Figure 3 for FluidML: Fast and Memory Efficient Inference Optimization

Figure 4 for FluidML: Fast and Memory Efficient Inference Optimization

Abstract:Machine learning models deployed on edge devices have enabled numerous exciting new applications, such as humanoid robots, AR glasses, and autonomous vehicles. However, the computing resources available on these edge devices are not catching up with the ever-growing number of parameters in these models. As the models become bigger and more complicated, the novel yet sophisticated structure challenges the inference runtime optimization. We present FluidML, a generic runtime memory management and optimization framework that can flexibly transform the model execution blueprint to achieve faster and more memory-efficient inference. Evaluations across different platforms show that FluidML can consistently reduce the end-to-end inference latency by up to 25.38% for popular language models and reduce peak memory usage by up to 41.47%, compared to state-of-the-art approaches. FluidML is of ~30K line of codes, built for general-purpose usage, and will be released as an open-source inference runtime optimization framework to the community.

Via

Access Paper or Ask Questions

Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection

Oct 01, 2024

Pengxi Zeng, Alberto Presta, Jonah Reinis, Dinesh Bharadia, Hang Qiu, Pamela Cosman

Figure 1 for Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection

Figure 2 for Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection

Figure 3 for Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection

Figure 4 for Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection

Abstract:Efficient point cloud (PC) compression is crucial for streaming applications, such as augmented reality and cooperative perception. Classic PC compression techniques encode all the points in a frame. Tailoring compression towards perception tasks at the receiver side, we ask the question, "Can we remove the ground points during transmission without sacrificing the detection performance?" Our study reveals a strong dependency on the ground from state-of-the-art (SOTA) 3D object detection models, especially on those points below and around the object. In this work, we propose a lightweight obstacle-aware Pillar-based Ground Removal (PGR) algorithm. PGR filters out ground points that do not provide context to object recognition, significantly improving compression ratio without sacrificing the receiver side perception performance. Not using heavy object detection or semantic segmentation models, PGR is light-weight, highly parallelizable, and effective. Our evaluations on KITTI and Waymo Open Dataset show that SOTA detection models work equally well with PGR removing 20-30% of the points, with a speeding of 86 FPS.

* 7 Pages; submitted to ICRA 2025

Via

Access Paper or Ask Questions

CMP: Cooperative Motion Prediction with Multi-Agent Communication

Mar 26, 2024

Zhuoyuan Wu, Yuping Wang, Hengbo Ma, Zhaowei Li, Hang Qiu, Jiachen Li

Figure 1 for CMP: Cooperative Motion Prediction with Multi-Agent Communication

Figure 2 for CMP: Cooperative Motion Prediction with Multi-Agent Communication

Figure 3 for CMP: Cooperative Motion Prediction with Multi-Agent Communication

Figure 4 for CMP: Cooperative Motion Prediction with Multi-Agent Communication

Abstract:The confluence of the advancement of Autonomous Vehicles (AVs) and the maturity of Vehicle-to-Everything (V2X) communication has enabled the capability of cooperative connected and automated vehicles (CAVs). Building on top of cooperative perception, this paper explores the feasibility and effectiveness of cooperative motion prediction. Our method, CMP, takes LiDAR signals as input to enhance tracking and prediction capabilities. Unlike previous work that focuses separately on either cooperative perception or motion prediction, our framework, to the best of our knowledge, is the first to address the unified problem where CAVs share information in both perception and prediction modules. Incorporated into our design is the unique capability to tolerate realistic V2X bandwidth limitations and transmission delays, while dealing with bulky perception representations. We also propose a prediction aggregation module, which unifies the predictions obtained by different CAVs and generates the final prediction. Through extensive experiments and ablation studies, we demonstrate the effectiveness of our method in cooperative perception, tracking, and motion prediction tasks. In particular, CMP reduces the average prediction error by 17.2\% with fewer missing detections compared with the no cooperation setting. Our work marks a significant step forward in the cooperative capabilities of CAVs, showcasing enhanced performance in complex scenarios.

Via

Access Paper or Ask Questions

Embodied Understanding of Driving Scenarios

Mar 07, 2024

Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li

Figure 1 for Embodied Understanding of Driving Scenarios

Figure 2 for Embodied Understanding of Driving Scenarios

Figure 3 for Embodied Understanding of Driving Scenarios

Figure 4 for Embodied Understanding of Driving Scenarios

Abstract:Embodied scene understanding serves as the cornerstone for autonomous agents to perceive, interpret, and respond to open driving scenarios. Such understanding is typically founded upon Vision-Language Models (VLMs). Nevertheless, existing VLMs are restricted to the 2D domain, devoid of spatial awareness and long-horizon extrapolation proficiencies. We revisit the key aspects of autonomous driving and formulate appropriate rubrics. Hereby, we introduce the Embodied Language Model (ELM), a comprehensive framework tailored for agents' understanding of driving scenes with large spatial and temporal spans. ELM incorporates space-aware pre-training to endow the agent with robust spatial localization capabilities. Besides, the model employs time-aware token selection to accurately inquire about temporal cues. We instantiate ELM on the reformulated multi-faced benchmark, and it surpasses previous state-of-the-art approaches in all aspects. All code, data, and models will be publicly shared.

* 43 pages, 16 figures

Via

Access Paper or Ask Questions

WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

Apr 07, 2023

Kan Chen, Runzhou Ge, Hang Qiu, Rami Ai-Rfou, Charles R. Qi, Xuanyu Zhou, Zoey Yang, Scott Ettinger, Pei Sun, Zhaoqi Leng(+5 more)

Figure 1 for WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

Figure 2 for WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

Figure 3 for WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

Figure 4 for WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

Abstract:Widely adopted motion forecasting datasets substitute the observed sensory inputs with higher-level abstractions such as 3D boxes and polylines. These sparse shapes are inferred through annotating the original scenes with perception systems' predictions. Such intermediate representations tie the quality of the motion forecasting models to the performance of computer vision models. Moreover, the human-designed explicit interfaces between perception and motion forecasting typically pass only a subset of the semantic information present in the original sensory input. To study the effect of these modular approaches, design new paradigms that mitigate these limitations, and accelerate the development of end-to-end motion forecasting models, we augment the Waymo Open Motion Dataset (WOMD) with large-scale, high-quality, diverse LiDAR data for the motion forecasting task. The new augmented dataset WOMD-LiDAR consists of over 100,000 scenes that each spans 20 seconds, consisting of well-synchronized and calibrated high quality LiDAR point clouds captured across a range of urban and suburban geographies (https://waymo.com/open/data/motion/). Compared to Waymo Open Dataset (WOD), WOMD-LiDAR dataset contains 100x more scenes. Furthermore, we integrate the LiDAR data into the motion forecasting model training and provide a strong baseline. Experiments show that the LiDAR data brings improvement in the motion forecasting task. We hope that WOMD-LiDAR will provide new opportunities for boosting end-to-end motion forecasting models.

* Dataset website: https://waymo.com/open/data/motion/

Via

Access Paper or Ask Questions

COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles

May 04, 2022

Jiaxun Cui, Hang Qiu, Dian Chen, Peter Stone, Yuke Zhu

Figure 1 for COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles

Figure 2 for COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles

Figure 3 for COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles

Figure 4 for COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles

Abstract:Optical sensors and learning algorithms for autonomous vehicles have dramatically advanced in the past few years. Nonetheless, the reliability of today's autonomous vehicles is hindered by the limited line-of-sight sensing capability and the brittleness of data-driven methods in handling extreme situations. With recent developments of telecommunication technologies, cooperative perception with vehicle-to-vehicle communications has become a promising paradigm to enhance autonomous driving in dangerous or emergency situations. We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving. Our model encodes LiDAR information into compact point-based representations that can be transmitted as messages between vehicles via realistic wireless channels. To evaluate our model, we develop AutoCastSim, a network-augmented driving simulation framework with example accident-prone scenarios. Our experiments on AutoCastSim suggest that our cooperative perception driving models lead to a 40% improvement in average success rate over egocentric driving models in these challenging driving situations and a 5 times smaller bandwidth requirement than prior work V2VNet. COOPERNAUT and AUTOCASTSIM are available at https://ut-austin-rpl.github.io/Coopernaut/.

Via

Access Paper or Ask Questions

Personalized Federated Learning of Driver Prediction Models for Autonomous Driving

Dec 02, 2021

Manabu Nakanoya, Junha Im, Hang Qiu, Sachin Katti, Marco Pavone, Sandeep Chinchali

Figure 1 for Personalized Federated Learning of Driver Prediction Models for Autonomous Driving

Figure 2 for Personalized Federated Learning of Driver Prediction Models for Autonomous Driving

Figure 3 for Personalized Federated Learning of Driver Prediction Models for Autonomous Driving

Figure 4 for Personalized Federated Learning of Driver Prediction Models for Autonomous Driving

Abstract:Autonomous vehicles (AVs) must interact with a diverse set of human drivers in heterogeneous geographic areas. Ideally, fleets of AVs should share trajectory data to continually re-train and improve trajectory forecasting models from collective experience using cloud-based distributed learning. At the same time, these robots should ideally avoid uploading raw driver interaction data in order to protect proprietary policies (when sharing insights with other companies) or protect driver privacy from insurance companies. Federated learning (FL) is a popular mechanism to learn models in cloud servers from diverse users without divulging private local data. However, FL is often not robust -- it learns sub-optimal models when user data comes from highly heterogeneous distributions, which is a key hallmark of human-robot interactions. In this paper, we present a novel variant of personalized FL to specialize robust robot learning models to diverse user distributions. Our algorithm outperforms standard FL benchmarks by up to 2x in real user studies that we conducted where human-operated vehicles must gracefully merge lanes with simulated AVs in the standard CARLA and CARLO AV simulators.

Via

Access Paper or Ask Questions

ML-EXray: Visibility into ML Deployment on the Edge

Nov 08, 2021

Hang Qiu, Ioanna Vavelidou, Jian Li, Evgenya Pergament, Pete Warden, Sandeep Chinchali, Zain Asgar, Sachin Katti

Figure 1 for ML-EXray: Visibility into ML Deployment on the Edge

Figure 2 for ML-EXray: Visibility into ML Deployment on the Edge

Figure 3 for ML-EXray: Visibility into ML Deployment on the Edge

Figure 4 for ML-EXray: Visibility into ML Deployment on the Edge

Abstract:Benefiting from expanding cloud infrastructure, deep neural networks (DNNs) today have increasingly high performance when trained in the cloud. Researchers spend months of effort competing for an extra few percentage points of model accuracy. However, when these models are actually deployed on edge devices in practice, very often, the performance can abruptly drop over 10% without obvious reasons. The key challenge is that there is not much visibility into ML inference execution on edge devices, and very little awareness of potential issues during the edge deployment process. We present ML-EXray, an end-to-end framework, which provides visibility into layer-level details of the ML execution, and helps developers analyze and debug cloud-to-edge deployment issues. More often than not, the reason for sub-optimal edge performance does not only lie in the model itself, but every operation throughout the data flow and the deployment process. Evaluations show that ML-EXray can effectively catch deployment issues, such as pre-processing bugs, quantization issues, suboptimal kernels, etc. Using ML-EXray, users need to write less than 15 lines of code to fully examine the edge deployment pipeline. Eradicating these issues, ML-EXray can correct model performance by up to 30%, pinpoint error-prone layers, and guide users to optimize kernel execution latency by two orders of magnitude. Code and APIs will be released as an open-source multi-lingual instrumentation library and a Python deployment validation library.

Via

Access Paper or Ask Questions

FedML: A Research Library and Benchmark for Federated Machine Learning

Jul 27, 2020

Chaoyang He, Songze Li, Jinhyun So, Mi Zhang, Hongyi Wang, Xiaoyang Wang, Praneeth Vepakomma, Abhishek Singh, Hang Qiu, Li Shen(+7 more)

Figure 1 for FedML: A Research Library and Benchmark for Federated Machine Learning

Figure 2 for FedML: A Research Library and Benchmark for Federated Machine Learning

Figure 3 for FedML: A Research Library and Benchmark for Federated Machine Learning

Figure 4 for FedML: A Research Library and Benchmark for Federated Machine Learning

Abstract:Federated learning is a rapidly growing research field in the machine learning domain. Although considerable research efforts have been made, existing libraries cannot adequately support diverse algorithmic development (e.g., diverse topology and flexible message exchange), and inconsistent dataset and model usage in experiments make fair comparisons difficult. In this work, we introduce FedML, an open research library and benchmark that facilitates the development of new federated learning algorithms and fair performance comparisons. FedML supports three computing paradigms (distributed training, mobile on-device training, and standalone simulation) for users to conduct experiments in different system environments. FedML also promotes diverse algorithmic research with flexible and generic API design and reference baseline implementations. A curated and comprehensive benchmark dataset for the non-I.I.D setting aims at making a fair comparison. We believe FedML can provide an efficient and reproducible means of developing and evaluating algorithms for the federated learning research community. We maintain the source code, documents, and user community at https://FedML.ai.

* We maintain the source code, documents, and user community at https://fedml.ai

Via

Access Paper or Ask Questions