Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Longji Zheng

Enhancing Traffic Signal Control through Model-based Reinforcement Learning and Policy Reuse

Mar 11, 2025

Yihong Li, Chengwei Zhang, Furui Zhan, Wanting Liu, Kailing Zhou, Longji Zheng

Figure 1 for Enhancing Traffic Signal Control through Model-based Reinforcement Learning and Policy Reuse

Figure 2 for Enhancing Traffic Signal Control through Model-based Reinforcement Learning and Policy Reuse

Figure 3 for Enhancing Traffic Signal Control through Model-based Reinforcement Learning and Policy Reuse

Figure 4 for Enhancing Traffic Signal Control through Model-based Reinforcement Learning and Policy Reuse

Abstract:Multi-agent reinforcement learning (MARL) has shown significant potential in traffic signal control (TSC). However, current MARL-based methods often suffer from insufficient generalization due to the fixed traffic patterns and road network conditions used during training. This limitation results in poor adaptability to new traffic scenarios, leading to high retraining costs and complex deployment. To address this challenge, we propose two algorithms: PLight and PRLight. PLight employs a model-based reinforcement learning approach, pretraining control policies and environment models using predefined source-domain traffic scenarios. The environment model predicts the state transitions, which facilitates the comparison of environmental features. PRLight further enhances adaptability by adaptively selecting pre-trained PLight agents based on the similarity between the source and target domains to accelerate the learning process in the target domain. We evaluated the algorithms through two transfer settings: (1) adaptability to different traffic scenarios within the same road network, and (2) generalization across different road networks. The results show that PRLight significantly reduces the adaptation time compared to learning from scratch in new TSC scenarios, achieving optimal performance using similarities between available and target scenarios.

Via

Access Paper or Ask Questions