We consider online learning for optimal network slice placement under the assumption that slice requests arrive according to a non-stationary Poisson process. We propose a framework based on Deep Reinforcement Learning (DRL) combined with a heuristic to design algorithms. We specifically design two pure-DRL algorithms and two families of hybrid DRL-heuristic algorithms. To validate their performance, we perform extensive simulations in the context of a large-scale operator infrastructure. The evaluation results show that the proposed hybrid DRL-heuristic algorithms require three orders of magnitude of learning episodes less than pure-DRL to achieve convergence. This result indicates that the proposed hybrid DRL-heuristic approach is more reliable than pure-DRL in a real non-stationary network scenario.