Fine-grained runtime power management techniques could be promising solutions for power reduction. Therefore, it is essential to establish accurate power monitoring schemes to obtain dynamic power variation in a short period (i.e., tens or hundreds of clock cycles). In this paper, we leverage a decision-tree-based power modeling approach to establish fine-grained hardware power monitoring on FPGA platforms. A generic and complete design flow is developed to implement the decision tree power model which is capable of precisely estimating dynamic power in a fine-grained manner. A flexible architecture of the hardware power monitoring is proposed, which can be instrumented in any RTL design for runtime power estimation, dispensing with the need for extra power measurement devices. Experimental results of applying the proposed model to benchmarks with different resource types reveal an average error up to 4% for dynamic power estimation. Moreover, the overheads of area, power and performance incurred by the power monitoring circuitry are extremely low. Finally, we apply our power monitoring technique to the power management using phase shedding with an on-chip multi-phase regulator as a proof of concept and the results demonstrate 14% efficiency enhancement for the power supply of the FPGA internal logic.