We study the problem of routing and scheduling of real-time flows over a multi-hop millimeter wave (mmWave) mesh. We develop a model-free deep reinforcement learning algorithm that determines which subset of the mmWave links should be activated during each time slot and using what power level. The proposed algorithm, called Adaptive Activator RL (AARL), can handle a variety of network topologies, network loads, and interference models, as well as adapt to different workloads. We demonstrate the operation of AARL on several topologies: a small topology with 10 links, a moderately-sized mesh with 48 links, and a large topology with 96 links. For each topology, the results of AARL are compared to those of a greedy scheduling algorithm. AARL is shown to outperform the greedy algorithm in two aspects. First, its schedule obtains higher goodput. Second, and even more importantly, while the run time of the greedy algorithm renders it impractical for real-time scheduling, the run time of AARL is suitable for meeting the time constraints of typical 5G networks.