Stacked intelligent metasurfaces (SIMs) represent a novel signal processing paradigm that enables over-the-air processing of electromagnetic waves at the speed of light. Their multi-layer architecture exhibits customizable computational capabilities compared to conventional single-layer reconfigurable intelligent surfaces and metasurface lenses. In this paper, we deploy SIM to improve the performance of multi-user multiple-input single-output (MISO) wireless systems through a low complexity manner with reduced numbers of transmit radio frequency chains. In particular, an optimization formulation for the joint design of the SIM phase shifts and the transmit power allocation is presented, which is efficiently tackled via a customized deep reinforcement learning (DRL) approach that systematically explores pre-designed states of the SIM-parametrized smart wireless environment. The presented performance evaluation results demonstrate the proposed method's capability to effectively learn from the wireless environment, while consistently outperforming conventional precoding schemes under low transmit power conditions. Furthermore, the implementation of hyperparameter tuning and whitening process significantly enhance the robustness of the proposed DRL framework.