As field-programmable gate arrays become prevalent in critical application domains, their power consumption is of high concern. In this paper, we present and evaluate a power monitoring scheme capable of accurately estimating the runtime dynamic power of FPGAs in a fine-grained timescale, in order to support emerging power management techniques. In particular, we describe a novel and specialized ensemble model which can be decomposed into multiple customized decision-tree-based base learners. To aid in model synthesis, a generic computer-aided design flow is proposed to generate samples, select features, tune hyperparameters and train the ensemble estimator. Besides this, a hardware realization of the trained ensemble estimator is presented for on-chip real-time power estimation. In the experiments, we first show that a single decision tree model can achieve prediction error within 4.51% of a commercial gate-level power estimation tool, which is 2.41--6.07x lower than provided by the commonly used linear model. More importantly, we study the extra gains in inference accuracy using the proposed ensemble model. Experimental results reveal that the ensemble monitoring method can further improve the accuracy of power predictions to within a maximum error of 1.90%. Moreover, the lookup table (LUT) overhead of the ensemble monitoring hardware employing up to 64 base learners is within 1.22% of the target FPGA, indicating its light-weight and scalable characteristics.