This paper demonstrates a data-driven control approach for demand response in real-life residential buildings. The objective is to optimally schedule the heating cycles of the Domestic Hot Water (DHW) buffer to maximize the self-consumption of the local photovoltaic (PV) production. A model-based reinforcement learning technique is used to tackle the underlying sequential decision-making problem. The proposed algorithm learns the stochastic occupant behavior, predicts the PV production and takes into account the dynamics of the system. A real-life experiment with six residential buildings is performed using this algorithm. The results show that the self-consumption of the PV production is significantly increased, compared to the default thermostat control.