Abstract:Traffic signal controllers play an essential role in the traffic system, while the current majority of them are not sufficiently flexible or adaptive to make optimal traffic schedules. In this paper we present an approach to learn policies for the signal controllers using deep reinforcement learning. Our method uses a novel formulation of the reward function that simultaneously considers efficiency and equity. We furthermore present a general approach to find the bound for the proposed equity factor. Moreover, we introduce the adaptive discounting approach that greatly stabilizes learning, which helps to keep high flexibility of green light duration. The experimental evaluations on both simulated and real-world data demonstrate that our proposed algorithm achieves state-of-the-art performance (previously held by traditional non-learning methods) on a wide range of traffic situations. A video of our experimental results can be found at: https://youtu.be/3rc5-ac3XX0