Abstract:The building sector consumes the largest energy in the world, and there have been considerable research interests in energy consumption and comfort management of buildings. Inspired by recent advances in reinforcement learning (RL), this paper aims at assessing the potential of RL in building climate control problems with occupant interaction. We apply a recent RL approach, called DDPG (deep deterministic policy gradient), for the continuous building control tasks and assess its performance with simulation studies in terms of its ability to handle (a) the partial state observability due to sensor limitations; (b) complex stochastic system with high-dimensional state-spaces, which are jointly continuous and discrete; (c) uncertainties due to ambient weather conditions, occupant's behavior, and comfort feelings. Especially, the partial observability and uncertainty due to the occupant interaction significantly complicate the control problem. Through simulation studies, the policy learned by DDPG demonstrates reasonable performance and computational tractability.