Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Population-Based Reinforcement Learning for Combinatorial Optimization

Oct 07, 2022

Nathan Grinsztajn, Daniel Furelos-Blanco, Thomas D. Barrett

Figure 1 for Population-Based Reinforcement Learning for Combinatorial Optimization

Figure 2 for Population-Based Reinforcement Learning for Combinatorial Optimization

Figure 3 for Population-Based Reinforcement Learning for Combinatorial Optimization

Figure 4 for Population-Based Reinforcement Learning for Combinatorial Optimization

Share this with someone who'll enjoy it:

Abstract:Applying reinforcement learning (RL) to combinatorial optimization problems is attractive as it removes the need for expert knowledge or pre-solved instances. However, it is unrealistic to expect an agent to solve these (often NP-)hard problems in a single shot at inference due to their inherent complexity. Thus, leading approaches often implement additional search strategies, from stochastic sampling and beam-search to explicit fine-tuning. In this paper, we argue for the benefits of learning a population of complementary policies, which can be simultaneously rolled out at inference. To this end, we introduce Poppy, a simple theoretically grounded training procedure for populations. Instead of relying on a predefined or hand-crafted notion of diversity, Poppy induces an unsupervised specialization targeted solely at maximizing the performance of the population. We show that Poppy produces a set of complementary policies, and obtains state-of-the-art RL results on three popular NP-hard problems: the traveling salesman (TSP), the capacitated vehicle routing (CVRP), and 0-1 knapsack (KP) problems. On TSP specifically, Poppy outperforms the previous state-of-the-art, dividing the optimality gap by 5 while reducing the inference time by more than an order of magnitude.

View paper on

Share this with someone who'll enjoy it:

Title:Population-Based Reinforcement Learning for Combinatorial Optimization

Paper and Code