We present a classical control mechanism for Quantum devices using Reinforcement Learning. Our strategy is applied to the Quantum Approximate Optimization Algorithm (QAOA) in order to optimize an objective function that encodes a solution to a hard combinatorial problem. This method provides optimal control of the Quantum device following a reformulation of QAOA as an environment where an autonomous classical agent interacts and performs actions to achieve higher rewards. This formulation allows a hybrid classical-Quantum device to train itself from previous executions using a continuous formulation of deep Q-learning to control the continuous degrees of freedom of QAOA. Our approach makes a selective use of Quantum measurements to complete the observations of the Quantum state available to the agent. We run tests of this approach on MAXCUT instances of size up to N = 21 obtaining optimal results. We show how this formulation can be used to transfer the knowledge from shorter training episodes to reach a regime of longer executions where QAOA delivers higher results.