Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yizhou Huang

Learning Velocity and Acceleration: Self-Supervised Motion Consistency for Pedestrian Trajectory Prediction

Mar 31, 2025

Yizhou Huang, Yihua Cheng, Kezhi Wang

Abstract:Understanding human motion is crucial for accurate pedestrian trajectory prediction. Conventional methods typically rely on supervised learning, where ground-truth labels are directly optimized against predicted trajectories. This amplifies the limitations caused by long-tailed data distributions, making it difficult for the model to capture abnormal behaviors. In this work, we propose a self-supervised pedestrian trajectory prediction framework that explicitly models position, velocity, and acceleration. We leverage velocity and acceleration information to enhance position prediction through feature injection and a self-supervised motion consistency mechanism. Our model hierarchically injects velocity features into the position stream. Acceleration features are injected into the velocity stream. This enables the model to predict position, velocity, and acceleration jointly. From the predicted position, we compute corresponding pseudo velocity and acceleration, allowing the model to learn from data-generated pseudo labels and thus achieve self-supervised learning. We further design a motion consistency evaluation strategy grounded in physical principles; it selects the most reasonable predicted motion trend by comparing it with historical dynamics and uses this trend to guide and constrain trajectory generation. We conduct experiments on the ETH-UCY and Stanford Drone datasets, demonstrating that our method achieves state-of-the-art performance on both datasets.

Via

Access Paper or Ask Questions

Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM

Mar 13, 2025

Yizhou Huang, Yihua Cheng, Kezhi Wang

Abstract:Motion prediction is crucial for autonomous driving, as it enables accurate forecasting of future vehicle trajectories based on historical inputs. This paper introduces Trajectory Mamba, a novel efficient trajectory prediction framework based on the selective state-space model (SSM). Conventional attention-based models face the challenge of computational costs that grow quadratically with the number of targets, hindering their application in highly dynamic environments. In response, we leverage the SSM to redesign the self-attention mechanism in the encoder-decoder architecture, thereby achieving linear time complexity. To address the potential reduction in prediction accuracy resulting from modifications to the attention mechanism, we propose a joint polyline encoding strategy to better capture the associations between static and dynamic contexts, ultimately enhancing prediction accuracy. Additionally, to balance prediction accuracy and inference speed, we adopted the decoder that differs entirely from the encoder. Through cross-state space attention, all target agents share the scene context, allowing the SSM to interact with the shared scene representation during decoding, thus inferring different trajectories over the next prediction steps. Our model achieves state-of-the-art results in terms of inference speed and parameter efficiency on both the Argoverse 1 and Argoverse 2 datasets. It demonstrates a four-fold reduction in FLOPs compared to existing methods and reduces parameter count by over 40% while surpassing the performance of the vast majority of previous methods. These findings validate the effectiveness of Trajectory Mamba in trajectory prediction tasks.

* Accepted by CVPR 2025

Via

Access Paper or Ask Questions

Resource-Efficient Affordance Grounding with Complementary Depth and Semantic Prompts

Mar 04, 2025

Yizhou Huang, Fan Yang, Guoliang Zhu, Gen Li, Hao Shi, Yukun Zuo, Wenrui Chen, Zhiyong Li, Kailun Yang

Abstract:Affordance refers to the functional properties that an agent perceives and utilizes from its environment, and is key perceptual information required for robots to perform actions. This information is rich and multimodal in nature. Existing multimodal affordance methods face limitations in extracting useful information, mainly due to simple structural designs, basic fusion methods, and large model parameters, making it difficult to meet the performance requirements for practical deployment. To address these issues, this paper proposes the BiT-Align image-depth-text affordance mapping framework. The framework includes a Bypass Prompt Module (BPM) and a Text Feature Guidance (TFG) attention selection mechanism. BPM integrates the auxiliary modality depth image directly as a prompt to the primary modality RGB image, embedding it into the primary modality encoder without introducing additional encoders. This reduces the model's parameter count and effectively improves functional region localization accuracy. The TFG mechanism guides the selection and enhancement of attention heads in the image encoder using textual features, improving the understanding of affordance characteristics. Experimental results demonstrate that the proposed method achieves significant performance improvements on public AGD20K and HICO-IIF datasets. On the AGD20K dataset, compared with the current state-of-the-art method, we achieve a 6.0% improvement in the KLD metric, while reducing model parameters by 88.8%, demonstrating practical application values. The source code will be made publicly available at https://github.com/DAWDSE/BiT-Align.

* The source code will be made publicly available at https://github.com/DAWDSE/BiT-Align

Via

Access Paper or Ask Questions

Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models

Sep 30, 2024

Yizhou Huang, Yihua Cheng, Kezhi Wang

Abstract:Deep learning architectures with powerful reasoning capabilities have driven significant advancements in autonomous driving technology. Large language models (LLMs) applied in this field can describe driving scenes and behaviors with a level of accuracy similar to human perception, particularly in visual tasks. Meanwhile, the rapid development of edge computing, with its advantage of proximity to data sources, has made edge devices increasingly important in autonomous driving. Edge devices process data locally, reducing transmission delays and bandwidth usage, and achieving faster response times. In this work, we propose a driving behavior narration and reasoning framework that applies LLMs to edge devices. The framework consists of multiple roadside units, with LLMs deployed on each unit. These roadside units collect road data and communicate via 5G NSR/NR networks. Our experiments show that LLMs deployed on edge devices can achieve satisfactory response speeds. Additionally, we propose a prompt strategy to enhance the narration and reasoning performance of the system. This strategy integrates multi-modal information, including environmental, agent, and motion data. Experiments conducted on the OpenDV-Youtube dataset demonstrate that our approach significantly improves performance across both tasks.

* Submitted for possible journal publication

Via

Access Paper or Ask Questions

Stochastic Planning for ASV Navigation Using Satellite Images

Sep 23, 2022

Yizhou Huang, Hamza Dugmag, Timothy D. Barfoot, Florian Shkurti

Figure 1 for Stochastic Planning for ASV Navigation Using Satellite Images

Figure 2 for Stochastic Planning for ASV Navigation Using Satellite Images

Figure 3 for Stochastic Planning for ASV Navigation Using Satellite Images

Figure 4 for Stochastic Planning for ASV Navigation Using Satellite Images

Abstract:Autonomous surface vessels (ASV) represent a promising technology to automate water-quality monitoring of lakes. In this work, we use satellite images as a coarse map and plan sampling routes for the robot. However, inconsistency between the satellite images and the actual lake, as well as environmental disturbances such as wind, aquatic vegetation, and changing water levels can make it difficult for robots to visit places suggested by the prior map. This paper presents a robust route-planning algorithm that minimizes the expected total travel distance given these environmental disturbances, which induce uncertainties in the map. We verify the efficacy of our algorithm in simulations of over a thousand Canadian lakes and demonstrate an application of our algorithm in a 3.7 km-long real-world robot experiment on a lake in Northern Ontario, Canada.

* 7 pages, 5 figures

Via

Access Paper or Ask Questions

Continual Model-Based Reinforcement Learning with Hypernetworks

Sep 25, 2020

Yizhou Huang, Kevin Xie, Homanga Bharadhwaj, Florian Shkurti

Figure 1 for Continual Model-Based Reinforcement Learning with Hypernetworks

Figure 2 for Continual Model-Based Reinforcement Learning with Hypernetworks

Figure 3 for Continual Model-Based Reinforcement Learning with Hypernetworks

Figure 4 for Continual Model-Based Reinforcement Learning with Hypernetworks

Abstract:Effective planning in model-based reinforcement learning (MBRL) and model-predictive control (MPC) relies on the accuracy of the learned dynamics model. In many instances of MBRL and MPC, this model is assumed to be stationary and is periodically re-trained from scratch on state transition experience collected from the beginning of environment interactions. This implies that the time required to train the dynamics model - and the pause required between plan executions - grows linearly with the size of the collected experience. We argue that this is too slow for lifelong robot learning and propose HyperCRL, a method that continually learns the encountered dynamics in a sequence of tasks using task-conditional hypernetworks. Our method has three main attributes: first, it enables constant-time dynamics learning sessions between planning and only needs to store the most recent fixed-size portion of the state transition experience; second, it uses fixed-capacity hypernetworks to represent non-stationary and task-aware dynamics; third, it outperforms existing continual learning alternatives that rely on fixed-capacity networks, and does competitively with baselines that remember an ever increasing coreset of past experience. We show that HyperCRL is effective in continual model-based reinforcement learning in robot locomotion and manipulation scenarios, such as tasks involving pushing and door opening. Our project website with code and videos is at this link http://rvl.cs.toronto.edu/blog/2020/hypercrl/

* 13 pages, 6 figures. Preliminary report, under review

Via

Access Paper or Ask Questions

Zeus: A System Description of the Two-Time Winner of the Collegiate SAE AutoDrive Competition

Apr 19, 2020

Keenan Burnett, Jingxing Qian, Xintong Du, Linqiao Liu, David J. Yoon, Tianchang Shen, Susan Sun, Sepehr Samavi, Michael J. Sorocky, Mollie Bianchi(+7 more)

Figure 1 for Zeus: A System Description of the Two-Time Winner of the Collegiate SAE AutoDrive Competition

Figure 2 for Zeus: A System Description of the Two-Time Winner of the Collegiate SAE AutoDrive Competition

Figure 3 for Zeus: A System Description of the Two-Time Winner of the Collegiate SAE AutoDrive Competition

Figure 4 for Zeus: A System Description of the Two-Time Winner of the Collegiate SAE AutoDrive Competition

Abstract:The SAE AutoDrive Challenge is a three-year collegiate competition to develop a self-driving car by 2020. The second year of the competition was held in June 2019 at MCity, a mock town built for self-driving car testing at the University of Michigan. Teams were required to autonomously navigate a series of intersections while handling pedestrians, traffic lights, and traffic signs. Zeus is aUToronto's winning entry in the AutoDrive Challenge. This article describes the system design and development of Zeus as well as many of the lessons learned along the way. This includes details on the team's organizational structure, sensor suite, software components, and performance at the Year 2 competition. With a team of mostly undergraduates and minimal resources, aUToronto has made progress towards a functioning self-driving vehicle, in just two years. This article may prove valuable to researchers looking to develop their own self-driving platform.

* Submitted to the Journal of Field Robotics

Via

Access Paper or Ask Questions

Learning of Coordination Policies for Robotic Swarms

Sep 19, 2017

Qiyang Li, Xintong Du, Yizhou Huang, Quinlan Sykora, Angela P. Schoellig

Figure 1 for Learning of Coordination Policies for Robotic Swarms

Figure 2 for Learning of Coordination Policies for Robotic Swarms

Figure 3 for Learning of Coordination Policies for Robotic Swarms

Figure 4 for Learning of Coordination Policies for Robotic Swarms

Abstract:Inspired by biological swarms, robotic swarms are envisioned to solve real-world problems that are difficult for individual agents. Biological swarms can achieve collective intelligence based on local interactions and simple rules; however, designing effective distributed policies for large-scale robotic swarms to achieve a global objective can be challenging. Although it is often possible to design an optimal centralized strategy for smaller numbers of agents, those methods can fail as the number of agents increases. Motivated by the growing success of machine learning, we develop a deep learning approach that learns distributed coordination policies from centralized policies. In contrast to traditional distributed control approaches, which are usually based on human-designed policies for relatively simple tasks, this learning-based approach can be adapted to more difficult tasks. We demonstrate the efficacy of our proposed approach on two different tasks, the well-known rendezvous problem and a more difficult particle assignment problem. For the latter, no known distributed policy exists. From extensive simulations, it is shown that the performance of the learned coordination policies is comparable to the centralized policies, surpassing state-of-the-art distributed policies. Thereby, our proposed approach provides a promising alternative for real-world coordination problems that would be otherwise computationally expensive to solve or intangible to explore.

* 8 pages, 11 figures, submitted to 2018 IEEE International Conference on Robotics and Automation

Via

Access Paper or Ask Questions