Unifying the correlative single-view satellite image building extraction and height estimation tasks indicates a promising way to share representations and acquire generalist model for large-scale urban 3D reconstruction. However, the common spatial misalignment between building footprints and stereo-reconstructed nDSM height labels incurs degraded performance on both tasks. To address this issue, we propose a Height-hierarchy Guided Dual-decoder Network (HGDNet) to estimate building height. Under the guidance of synthesized discrete height-hierarchy nDSM, auxiliary height-hierarchical building extraction branch enhance the height estimation branch with implicit constraints, yielding an accuracy improvement of more than 6% on the DFC 2023 track2 dataset. Additional two-stage cascade architecture is adopted to achieve more accurate building extraction. Experiments on the DFC 2023 Track 2 dataset shows the superiority of the proposed method in building height estimation ({\delta}1:0.8012), instance extraction (AP50:0.7730), and the final average score 0.7871 ranks in the first place in test phase.