Envisioned as the next-generation transceiver technology, the holographic multiple-input-multiple-output (HMIMO) garners attention for its superior capabilities of fabricating electromagnetic (EM) waves. However, the densely packed antenna elements significantly increase the dimension of the HMIMO channel matrix, rendering traditional channel estimation methods inefficient. While the dimension curse can be relieved to avoid the proportional increase with the antenna density using the state-of-the-art wavenumber-domain sparse representation, the sparse recovery complexity remains tied to the order of non-zero elements in the sparse channel, which still considerably exceeds the number of scatterers. By modeling the inherent clustered sparsity using a Gaussian mixed model (GMM)-based von Mises-Fisher (vMF) distribution, the to-be-estimated channel characteristics can be compressed to the scatterer level. Upon the sparsity extraction, a novel wavenumber-domain expectation-maximization (WD-EM) algorithm is proposed to implement the cluster-by-cluster variational inference, thus significantly reducing the computational complexity. Simulation results verify the robustness of the proposed scheme across overheads and signal-to-noise ratio (SNR).