Abstract:We present a self-learning approach that combines deep reinforcement learning and Monte Carlo tree search to solve the traveling salesman problem. The proposed approach has two advantages. First, it adopts deep reinforcement learning to compute the value functions for decision, which removes the need of hand-crafted features and labelled data. Second, it uses Monte Carlo tree search to select the best policy by comparing different value functions, which increases its generalization ability. Experimental results show that the proposed method performs favorably against other methods in small-to-medium problem settings. And it shows comparable performance as state-of-the-art in large problem setting.