Graph matching under node and pairwise constraints has been a building block in areas from combinatorial optimization, machine learning to computer vision, for effective structural representation and association. We present a reinforcement learning solver that seeks the node correspondence between two graphs, whereby the node embedding model on the association graph is learned to sequentially find the node-to-node matching. Our method differs from the previous deep graph matching model in the sense that they are focused on the front-end feature and affinity function learning while our method aims to learn the backend decision making given the affinity objective function whatever obtained by learning or not. Such an objective function maximization setting naturally fits with the reinforcement learning mechanism, of which the learning procedure is label-free. Besides, the model is not restricted to a fixed number of nodes for matching. These features make it more suitable for practical usage. Extensive experimental results on both synthetic datasets, natural images, and QAPLIB showcase the superior performance regarding both matching accuracy and efficiency. To our best knowledge, this is the first deep reinforcement learning solver for graph matching.