Abstract:Reinforcement learning (RL) has achieved state-of-the-art performance in many scientific and applied problems. However, some complex tasks still are difficult to handle using a single model and algorithm. The highly popular ensemble reinforcement learning (ERL) has become an important method to handle complex tasks with the advantage of combining reinforcement learning and ensemble learning (EL). ERL combines several models or training algorithms to fully explore the problem space and has strong generalization characteristics. This study presents a comprehensive survey on ERL to provide the readers with an overview of the recent advances and challenges. The background is introduced first. The strategies successfully applied in ERL are analyzed in detail. Finally, we outline some open questions and conclude by discussing some future research directions of ERL. This survey contributes to ERL development by providing a guide for future scientific research and engineering applications.
Abstract:Vehicle routing problem (VRP) is a typical discrete combinatorial optimization problem, and many models and algorithms have been proposed to solve VRP and variants. Although existing approaches has contributed a lot to the development of this field, these approaches either are limited in problem size or need manual intervening in choosing parameters. To tackle these difficulties, many studies consider learning-based optimization algorithms to solve VRP. This paper reviews recent advances in this field and divides relevant approaches into end-to-end approaches and step-by-step approaches. We design three part experiments to justly evaluate performance of four representative learning-based optimization algorithms and conclude that combining heuristic search can effectively improve learning ability and sampled efficiency of LBO models. Finally we point out that research trend of LBO algorithms is to solve large-scale and multiple constraints problems from real world.
Abstract:There hardly exists a general solver that is efficient for scheduling problems due to their diversity and complexity. In this study, we develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together to efficiently deal with complex scheduling problems. The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively. This offers a novel and general paradigm that combines RL with OR approaches to solving scheduling problems, which leverages the respective strengths of RL and OR: The MDP narrows down the search space of the original problem through an RL method, while the mixed-integer programming process is settled by an OR algorithm. These two stages are performed iteratively and interactively until the termination criterion has been met. Under this idea, two implementation versions of the combination methods of RL and OR are put forward. The agile Earth observation satellite scheduling problem is selected as an example to demonstrate the effectiveness of the proposed scheduling framework and methods. The convergence and generalization capability of the methods are verified by the performance of training scenarios, while the efficiency and accuracy are tested in 50 untrained scenarios. The results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems. In addition, it can be found that RL-based optimization algorithms have stronger scalability than non-learning algorithms. This work reveals the advantage of combining reinforcement learning methods with heuristic methods or mathematical programming methods for solving complex combinatorial optimization problems.
Abstract:In this paper, we study a new representation-learning task, which we termed as disassembling object representations. Given an image featuring multiple objects, the goal of disassembling is to acquire a latent representation, of which each part corresponds to one category of objects. Disassembling thus finds its application in a wide domain such as image editing and few- or zero-shot learning, as it enables category-specific modularity in the learned representations. To this end, we propose an unsupervised approach to achieving disassembling, named Unsupervised Disassembling Object Representation (UDOR). UDOR follows a double auto-encoder architecture, in which a fuzzy classification and an object-removing operation are imposed. The fuzzy classification constrains each part of the latent representation to encode features of up to one object category, while the object-removing, combined with a generative adversarial network, enforces the modularity of the representations and integrity of the reconstructed image. Furthermore, we devise two metrics to respectively measure the modularity of disassembled representations and the visual integrity of reconstructed images. Experimental results demonstrate that the proposed UDOR, despited unsupervised, achieves truly encouraging results on par with those of supervised methods.