This paper investigates a method to improve buildings' thermal predictive control performance via online identification and excitation (active learning process) that minimally disrupts normal operations. In previous studies we have demonstrated scalable methods to acquire multi-zone thermal models of passive buildings using a gray-box approach that leverages building topology and measurement data. Here we extend the method to multi-zone actively controlled buildings and examine how to improve the thermal model estimation by using the controller to excite unknown portions of the building's dynamics. Comparing against a baseline thermostat controller, we demonstrate the utility of both the initially acquired and improved thermal models within a Model Predictive Control (MPC) framework, which anticipates weather uncertainty and time-varying temperature set-points. A simulation study demonstrates self-excitation improves model estimation, which corresponds to improved MPC energy savings and occupant comfort. By coupling building topology, estimation, and control routines into a single online framework, we have demonstrated the potential for low-cost scalable methods to actively learn and control buildings to ensure occupant comfort and minimize energy usage, all while using the existing building's HVAC sensors and hardware.