Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joungho Kim

RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

Jun 29, 2023

Federico Berto, Chuanbo Hua, Junyoung Park, Minsu Kim, Hyeonah Kim, Jiwoo Son, Haeyeon Kim, Joungho Kim, Jinkyoo Park

Figure 1 for RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

Figure 2 for RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

Figure 3 for RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

Figure 4 for RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

Abstract:We introduce RL4CO, an extensive reinforcement learning (RL) for combinatorial optimization (CO) benchmark. RL4CO employs state-of-the-art software libraries as well as best practices in implementation, such as modularity and configuration management, to be efficient and easily modifiable by researchers for adaptations of neural network architecture, environments, and algorithms. Contrary to the existing focus on specific tasks like the traveling salesman problem (TSP) for performance assessment, we underline the importance of scalability and generalization capabilities for diverse optimization tasks. We also systematically benchmark sample efficiency, zero-shot generalization, and adaptability to changes in data distributions of various models. Our experiments show that some recent state-of-the-art methods fall behind their predecessors when evaluated using these new metrics, suggesting the necessity for a more balanced view of the performance of neural CO solvers. We hope RL4CO will encourage the exploration of novel solutions to complex real-world tasks, allowing to compare with existing methods through a standardized interface that decouples the science from the software engineering. We make our library publicly available at https://github.com/kaist-silab/rl4co.

Via

Access Paper or Ask Questions

Collaborative Distillation Meta Learning for Simulation Intensive Hardware Design

May 26, 2022

Haeyeon Kim, Minsu Kim, Joungho Kim, Jinkyoo Park

Figure 1 for Collaborative Distillation Meta Learning for Simulation Intensive Hardware Design

Figure 2 for Collaborative Distillation Meta Learning for Simulation Intensive Hardware Design

Figure 3 for Collaborative Distillation Meta Learning for Simulation Intensive Hardware Design

Figure 4 for Collaborative Distillation Meta Learning for Simulation Intensive Hardware Design

Abstract:This paper proposes a novel collaborative distillation meta learning (CDML) framework for simulation intensive hardware design problems. Deep reinforcement learning (DRL) has shown promising performance in various hardware design problems. However, previous works on DRL-based hardware design only dealt with problems with simplified objectives, which are not practical. In fact, the objective evaluation of real-world electrical performance through simulation is costly in terms of both time and computation, making DRL scheme involving extensive reward calculations not suitable. In this paper, we apply the CDML framework to decoupling capacitor placement problem (DPP), one of the significant simulation intensive hardware design problems. The CDML framework consists of a context-based meta learner and collaborative distillation scheme to produce a reusable solver. The context-based meta learner captures the location of probing port (i.e., target circuit block) and improves generalization capability. The collaborative distillation scheme with equivariant label transformation imposes the action-permutation (AP)-equivariant nature of placement problems, which not only improves sample efficiency but also improves generalization capability. Extensive experimental results verified that our CDML outperforms both neural baselines and iterative conventional design methods in terms of real-world objective, power integrity, with zero-shot transfer-ability.

* 29 pages, 19 figures

Via

Access Paper or Ask Questions

Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM)

Mar 29, 2022

Hyunwook Park, Minsu Kim, Seongguk Kim, Keunwoo Kim, Haeyeon Kim, Taein Shin, Keeyoung Son, Boogyo Sim, Subin Kim, Seungtaek Jeong(+2 more)

Figure 1 for Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM)

Figure 2 for Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM)

Figure 3 for Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM)

Figure 4 for Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM)

Abstract:In this article, for the first time, we propose a transformer network-based reinforcement learning (RL) method for power distribution network (PDN) optimization of high bandwidth memory (HBM). The proposed method can provide an optimal decoupling capacitor (decap) design to maximize the reduction of PDN self- and transfer impedance seen at multiple ports. An attention-based transformer network is implemented to directly parameterize decap optimization policy. The optimality performance is significantly improved since the attention mechanism has powerful expression to explore massive combinatorial space for decap assignments. Moreover, it can capture sequential relationships between the decap assignments. The computing time for optimization is dramatically reduced due to the reusable network on positions of probing ports and decap assignment candidates. This is because the transformer network has a context embedding process to capture meta-features including probing ports positions. In addition, the network is trained with randomly generated data sets. Therefore, without additional training, the trained network can solve new decap optimization problems. The computing time for training and data cost are critically decreased due to the scalability of the network. Thanks to its shared weight property, the network can adapt to a larger scale of problems without additional training. For verification, we compare the results with conventional genetic algorithm (GA), random search (RS), and all the previous RL-based methods. As a result, the proposed method outperforms in all the following aspects: optimality performance, computing time, and data efficiency.

* 16 pages, 13 figures, Under review as a journal paper at IEEE Transactions on Microwave and Theory and Techniques (TMTT)

Via

Access Paper or Ask Questions

Learning Collaborative Policies to Solve NP-hard Routing Problems

Oct 26, 2021

Minsu Kim, Jinkyoo Park, Joungho Kim

Figure 1 for Learning Collaborative Policies to Solve NP-hard Routing Problems

Figure 2 for Learning Collaborative Policies to Solve NP-hard Routing Problems

Figure 3 for Learning Collaborative Policies to Solve NP-hard Routing Problems

Figure 4 for Learning Collaborative Policies to Solve NP-hard Routing Problems

Abstract:Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions. On the other hand, the reviser modifies each candidate solution generated by the seeder; it partitions the full trajectory into sub-tours and simultaneously revises each sub-tour to minimize its traveling distance. Thus, the reviser is trained to improve the candidate solution's quality, focusing on the reduced solution space (which is beneficial for exploitation). Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP).

* NeurIPS 2021, 23 pages, 8 figures

Via

Access Paper or Ask Questions