Simulated Annealing is the crowning glory of Markov Chain Monte Carlo Methods for the solution of NP-hard optimization problems in which the cost function is known. Here, by replacing the Metropolis engine of Simulated Annealing with a reinforcement learning variation -- that we call Macau Algorithm -- we show that the Simulated Annealing heuristic can be very effective also when the cost function is unknown and has to be learned by an artificial agent.