Due to increasing penetration of volatile distributed photovoltaic (PV) resources, real-time monitoring of customers at the grid-edge has become a critical task. However, this requires solving the distribution system state estimation (DSSE) jointly for both primary and secondary levels of distribution grids, which is computationally complex and lacks scalability to large systems. To achieve near real-time solutions for DSSE, we present a novel hierarchical reinforcement learning-aided framework: at the first layer, a weighted least squares (WLS) algorithm solves the DSSE over primary medium-voltage feeders; at the second layer, deep actor-critic (A-C) modules are trained for each secondary transformer using measurement residuals to estimate the states of low-voltage circuits and capture the impact of PVs at the grid-edge. While the A-C parameter learning process takes place offline, the trained A-C modules are deployed online for fast secondary grid state estimation; this is the key factor in scalability and computational efficiency of the framework. To maintain monitoring accuracy, the two levels exchange boundary information with each other at the secondary nodes, including transformer voltages (first layer to second layer) and active/reactive total power injection (second layer to first layer). This interactive information passing strategy results in a closed-loop structure that is able to track optimal solutions at both layers in few iterations. Moreover, our model can handle the topology changes using the Jacobian matrices of the first layer. We have performed numerical experiments using real utility data and feeder models to verify the performance of the proposed framework.