Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aydar Mynbay

Scalable and transferable learning of algorithms via graph embedding for multi-robot reward collection

Jun 01, 2019

Hyunwook Kang, Aydar Mynbay, James R. Morrison, Jinkyoo Park

Figure 1 for Scalable and transferable learning of algorithms via graph embedding for multi-robot reward collection

Figure 2 for Scalable and transferable learning of algorithms via graph embedding for multi-robot reward collection

Figure 3 for Scalable and transferable learning of algorithms via graph embedding for multi-robot reward collection

Figure 4 for Scalable and transferable learning of algorithms via graph embedding for multi-robot reward collection

Abstract:Can the success of reinforcement learning methods for combinatorial optimization problems be extended to multi-robot scheduling problems in stochastic contexts? Three issues are particularly important in this context: quality of the resulting decisions, scalability, and transferability. To achieve these ends we generalize the concept of clique potential to stochastic clique potential. We extend a mean field inference fixed point iteration with this new concept and use it to modify thestructure2vec method. We next propose a new reinforcement learning framework combining a graph representation of the problem and a consensus auction inspired by heuristics in the problem domain. This representation enables transferability in terms of the number of robots. Sequential encoding of information through multiple layers of our extended structure2vec results in 96% optimal performance of the learned heuristics. While training tractability is inherited from single robot methods in the literature, use of a multi-robot consensus auction-based relaxation of the maximum operation in the Bellman optimality equation allows for scalable selection of actions in the fitted Q-iteration. We apply our framework to multi-robot reward collection (MRRC) problems in stochastic environments with linear or non-linear rewards. In stochastic environments with non-linear rewards, the new method achieves 20% superior performance relative to the popular sequential greedy assignment (SGA) algorithm. Linear scalability in terms of training is achieved and demonstrated. Transferability is demonstrated by the use of a heuristic trained with three robots that continues to achieve 95% optimal performance when applied to problems with various numbers of robots. We further mention the results obtained when extending the approach to identical parallel machine scheduling(IPMS) problems.

* under review, Neural Information Processing Systems 2019

Via

Access Paper or Ask Questions