Moving target defense (MTD) is a proactive defense approach that aims to thwart attacks by continuously changing the attack surface of a system (e.g., changing host or network configurations), thereby increasing the adversary's uncertainty and attack cost. To maximize the impact of MTD, a defender must strategically choose when and what changes to make, taking into account both the characteristics of its system as well as the adversary's observed activities. Finding an optimal strategy for MTD presents a significant challenge, especially when facing a resourceful and determined adversary who may respond to the defender's actions. In this paper, we propose finding optimal MTD strategies using deep reinforcement learning. Based on an established model of adaptive MTD, we formulate finding an MTD strategy as finding a policy for a partially-observable Markov decision process. To significantly improve training performance, we introduce compact memory representations. To demonstrate our approach, we provide thorough numerical results, showing significant improvement over existing strategies.