Stability is a basic requirement when studying the behavior of dynamical systems. However, stabilizing dynamical systems via reinforcement learning is challenging because only little data can be collected over short time horizons before instabilities are triggered and data become meaningless. This work introduces a reinforcement learning approach that is formulated over latent manifolds of unstable dynamics so that stabilizing policies can be trained from few data samples. The unstable manifolds are minimal in the sense that they contain the lowest dimensional dynamics that are necessary for learning policies that guarantee stabilization. This is in stark contrast to generic latent manifolds that aim to approximate all -- stable and unstable -- system dynamics and thus are higher dimensional and often require higher amounts of data. Experiments demonstrate that the proposed approach stabilizes even complex physical systems from few data samples for which other methods that operate either directly in the system state space or on generic latent manifolds fail.