Deep learning applications require optimization of nonconvex objective functions. These functions have multiple local minima and their optimization is a challenging problem. Simulated Annealing is a well-established method for optimization of such functions, but its efficiency depends on the efficiency of the adapted sampling methods. We explore relations between the Langevin dynamics and stochastic optimization. By combining the Momentum optimizer with Simulated Annealing, we propose CoolMomentum - a prospective stochastic optimization method. Empirical results confirm the efficiency of the proposed theoretical approach.