Stacked intelligent metasurface (SIM) is an emerging programmable metasurface architecture that can implement signal processing directly in the electromagnetic wave domain, thereby enabling efficient implementation of ultra-massive multiple-input multiple-output (MIMO) transceivers with a limited number of radio frequency (RF) chains. Channel estimation (CE) is challenging for SIM-enabled communication systems due to the multi-layer architecture of SIM, and because we need to estimate large dimensional channels between the SIM and users with a limited number of RF chains. To efficiently solve this problem, we develop a novel hybrid digital-wave domain channel estimator, in which the received training symbols are first processed in the wave domain within the SIM layers, and then processed in the digital domain. The wave domain channel estimator, parametrized by the phase shifts applied by the meta-atoms in all layers, is optimized to minimize the mean squared error (MSE) using a gradient descent algorithm, within which the digital part is optimally updated. For an SIM-enabled multi-user system equipped with 4 RF chains and a 6-layer SIM with 64 meta-atoms each, the proposed estimator yields an MSE that is very close to that achieved by fully digital CE in a massive MIMO system employing 64 RF chains. This high CE accuracy is achieved at the cost of a training overhead that can be reduced by exploiting the potential low rank of channel correlation matrices.