Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianyong Sun

Learning to Insert for Constructive Neural Vehicle Routing Solver

May 20, 2025

Fu Luo, Xi Lin, Mengyuan Zhong, Fei Liu, Zhenkun Wang, Jianyong Sun, Qingfu Zhang

Abstract:Neural Combinatorial Optimisation (NCO) is a promising learning-based approach for solving Vehicle Routing Problems (VRPs) without extensive manual design. While existing constructive NCO methods typically follow an appending-based paradigm that sequentially adds unvisited nodes to partial solutions, this rigid approach often leads to suboptimal results. To overcome this limitation, we explore the idea of insertion-based paradigm and propose Learning to Construct with Insertion-based Paradigm (L2C-Insert), a novel learning-based method for constructive NCO. Unlike traditional approaches, L2C-Insert builds solutions by strategically inserting unvisited nodes at any valid position in the current partial solution, which can significantly enhance the flexibility and solution quality. The proposed framework introduces three key components: a novel model architecture for precise insertion position prediction, an efficient training scheme for model optimization, and an advanced inference technique that fully exploits the insertion paradigm's flexibility. Extensive experiments on both synthetic and real-world instances of the Travelling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) demonstrate that L2C-Insert consistently achieves superior performance across various problem sizes.

Via

Access Paper or Ask Questions

Learning from Few Demonstrations with Frame-Weighted Motion Generation

Mar 29, 2023

Jianyong Sun, Jihong Zhu, Jens Kober, Michael Gienger

Abstract:Learning from Demonstration (LfD) aims to encode versatile skills from human demonstrations. The field has been gaining popularity since it facilitates knowledge transfer to robots without requiring expert knowledge in robotics. During task executions, the robot motion is usually influenced by constraints imposed by environments. In light of this, task-parameterized LfD (TP-LfD) encodes relevant contextual information in reference frames, enabling better skill generalization to new situations. However, most TP-LfD algorithms require multiple demonstrations in various environment conditions to ensure sufficient statistics for a meaningful model. It is not a trivial task for robot users to create different situations and perform demonstrations under all of them. Therefore, this paper presents a novel concept for learning motion policies from few demonstrations by finding the reference frame weights which capture frame importance/relevance during task executions. Experimental results in both simulation and real robotic environments validate our approach.

* Submitted to RA-L

Via

Access Paper or Ask Questions

Learning adaptive differential evolution algorithm from optimization experiences by policy gradient

Feb 06, 2021

Jianyong Sun, Xin Liu, Thomas Bäck, Zongben Xu

Figure 1 for Learning adaptive differential evolution algorithm from optimization experiences by policy gradient

Figure 2 for Learning adaptive differential evolution algorithm from optimization experiences by policy gradient

Figure 3 for Learning adaptive differential evolution algorithm from optimization experiences by policy gradient

Figure 4 for Learning adaptive differential evolution algorithm from optimization experiences by policy gradient

Abstract:Differential evolution is one of the most prestigious population-based stochastic optimization algorithm for black-box problems. The performance of a differential evolution algorithm depends highly on its mutation and crossover strategy and associated control parameters. However, the determination process for the most suitable parameter setting is troublesome and time-consuming. Adaptive control parameter methods that can adapt to problem landscape and optimization environment are more preferable than fixed parameter settings. This paper proposes a novel adaptive parameter control approach based on learning from the optimization experiences over a set of problems. In the approach, the parameter control is modeled as a finite-horizon Markov decision process. A reinforcement learning algorithm, named policy gradient, is applied to learn an agent (i.e. parameter controller) that can provide the control parameters of a proposed differential evolution adaptively during the search procedure. The differential evolution algorithm based on the learned agent is compared against nine well-known evolutionary algorithms on the CEC'13 and CEC'17 test suites. Experimental results show that the proposed algorithm performs competitively against these compared algorithms on the test suites.

Via

Access Paper or Ask Questions

Amortized Variational Deep Q Network

Nov 03, 2020

Haotian Zhang, Yuhao Wang, Jianyong Sun, Zongben Xu

Figure 1 for Amortized Variational Deep Q Network

Figure 2 for Amortized Variational Deep Q Network

Figure 3 for Amortized Variational Deep Q Network

Figure 4 for Amortized Variational Deep Q Network

Abstract:Efficient exploration is one of the most important issues in deep reinforcement learning. To address this issue, recent methods consider the value function parameters as random variables, and resort variational inference to approximate the posterior of the parameters. In this paper, we propose an amortized variational inference framework to approximate the posterior distribution of the action value function in Deep Q Network. We establish the equivalence between the loss of the new model and the amortized variational inference loss. We realize the balance of exploration and exploitation by assuming the posterior as Cauchy and Gaussian, respectively in a two-stage training process. We show that the amortized framework can results in significant less learning parameters than existing state-of-the-art method. Experimental results on classical control tasks in OpenAI Gym and chain Markov Decision Process tasks show that the proposed method performs significantly better than state-of-art methods and requires much less training time.

* Accepted to appear in the Deep Reinforcement Learning Workshop at NeurIPS 2020

Via

Access Paper or Ask Questions

Graph Neural Network Encoding for Community Detection in Attribute Networks

Jun 06, 2020

Jianyong Sun, Wei Zheng, Qingfu Zhang, Zongben Xu

Figure 1 for Graph Neural Network Encoding for Community Detection in Attribute Networks

Figure 2 for Graph Neural Network Encoding for Community Detection in Attribute Networks

Figure 3 for Graph Neural Network Encoding for Community Detection in Attribute Networks

Figure 4 for Graph Neural Network Encoding for Community Detection in Attribute Networks

Abstract:In this paper, we first propose a graph neural network encoding method for multiobjective evolutionary algorithm to handle the community detection problem in complex attribute networks. In the graph neural network encoding method, each edge in an attribute network is associated with a continuous variable. Through non-linear transformation, a continuous valued vector (i.e. a concatenation of the continuous variables associated with all edges) is transferred to a discrete valued community grouping solution. Further, two new objective functions for single- and multi-attribute network are proposed to evaluate the attribute homogeneity of the nodes in communities, respectively. Based on the new encoding method and the two new objectives, a multiobjective evolutionary algorithm (MOEA) based upon NSGA-II, termed as continuous encoding MOEA, is developed for the transformed community detection problem with continuous decision variables. Experimental results on single- and multi-attribute real-life networks with different types show that the developed algorithm performs significantly better than some well-known evolutionary and non-evolutionary based algorithms. The fitness landscape analysis verifies that the transformed community detection problems have smoother landscapes than those of the original problems, which justifies the effectiveness of the proposed graph neural network encoding method.

Via

Access Paper or Ask Questions

On Hyper-parameter Tuning for Stochastic Optimization Algorithms

Mar 10, 2020

Haotian Zhang, Jianyong Sun, Zongben Xu

Figure 1 for On Hyper-parameter Tuning for Stochastic Optimization Algorithms

Figure 2 for On Hyper-parameter Tuning for Stochastic Optimization Algorithms

Figure 3 for On Hyper-parameter Tuning for Stochastic Optimization Algorithms

Figure 4 for On Hyper-parameter Tuning for Stochastic Optimization Algorithms

Abstract:This paper proposes the first-ever algorithmic framework for tuning hyper-parameters of stochastic optimization algorithm based on reinforcement learning. Hyper-parameters impose significant influences on the performance of stochastic optimization algorithms, such as evolutionary algorithms (EAs) and meta-heuristics. Yet, it is very time-consuming to determine optimal hyper-parameters due to the stochastic nature of these algorithms. We propose to model the tuning procedure as a Markov decision process, and resort the policy gradient algorithm to tune the hyper-parameters. Experiments on tuning stochastic algorithms with different kinds of hyper-parameters (continuous and discrete) for different optimization problems (continuous and discrete) show that the proposed hyper-parameter tuning algorithms do not require much less running times of the stochastic algorithms than bayesian optimization method. The proposed framework can be used as a standard tool for hyper-parameter tuning in stochastic algorithms.

* Our explanation of reinforcement learning for adjustment algorithm is far fetched in Section ?B

Via

Access Paper or Ask Questions

Learning to be Global Optimizer

Mar 10, 2020

Haotian Zhang, Jianyong Sun, Zongben Xu

Figure 1 for Learning to be Global Optimizer

Figure 2 for Learning to be Global Optimizer

Figure 3 for Learning to be Global Optimizer

Figure 4 for Learning to be Global Optimizer

Abstract:The advancement of artificial intelligence has cast a new light on the development of optimization algorithm. This paper proposes to learn a two-phase (including a minimization phase and an escaping phase) global optimization algorithm for smooth non-convex functions. For the minimization phase, a model-driven deep learning method is developed to learn the update rule of descent direction, which is formalized as a nonlinear combination of historical information, for convex functions. We prove that the resultant algorithm with the proposed adaptive direction guarantees convergence for convex functions. Empirical study shows that the learned algorithm significantly outperforms some well-known classical optimization algorithms, such as gradient descent, conjugate descent and BFGS, and performs well on ill-posed functions. The escaping phase from local optimum is modeled as a Markov decision process with a fixed escaping policy. We further propose to learn an optimal escaping policy by reinforcement learning. The effectiveness of the escaping policies is verified by optimizing synthesized functions and training a deep neural network for CIFAR image classification. The learned two-phase global optimization algorithm demonstrates a promising global search capability on some benchmark functions and machine learning tasks.

Via

Access Paper or Ask Questions

Adaptive Structural Hyper-Parameter Configuration by Q-Learning

Mar 02, 2020

Haotian Zhang, Jianyong Sun, Zongben Xu

Figure 1 for Adaptive Structural Hyper-Parameter Configuration by Q-Learning

Figure 2 for Adaptive Structural Hyper-Parameter Configuration by Q-Learning

Figure 3 for Adaptive Structural Hyper-Parameter Configuration by Q-Learning

Figure 4 for Adaptive Structural Hyper-Parameter Configuration by Q-Learning

Abstract:Tuning hyper-parameters for evolutionary algorithms is an important issue in computational intelligence. Performance of an evolutionary algorithm depends not only on its operation strategy design, but also on its hyper-parameters. Hyper-parameters can be categorized in two dimensions as structural/numerical and time-invariant/time-variant. Particularly, structural hyper-parameters in existing studies are usually tuned in advance for time-invariant parameters, or with hand-crafted scheduling for time-invariant parameters. In this paper, we make the first attempt to model the tuning of structural hyper-parameters as a reinforcement learning problem, and present to tune the structural hyper-parameter which controls computational resource allocation in the CEC 2018 winner algorithm by Q-learning. Experimental results show favorably against the winner algorithm on the CEC 2018 test functions.

Via

Access Paper or Ask Questions

Multi-Objectivizing Sum-of-the-Parts Combinatorial Optimization Problems by Random Objective Decomposition

Nov 19, 2019

Jialong Shi, Jianyong Sun, Qingfu Zhang

Figure 1 for Multi-Objectivizing Sum-of-the-Parts Combinatorial Optimization Problems by Random Objective Decomposition

Figure 2 for Multi-Objectivizing Sum-of-the-Parts Combinatorial Optimization Problems by Random Objective Decomposition

Figure 3 for Multi-Objectivizing Sum-of-the-Parts Combinatorial Optimization Problems by Random Objective Decomposition

Figure 4 for Multi-Objectivizing Sum-of-the-Parts Combinatorial Optimization Problems by Random Objective Decomposition

Abstract:Multi-objectivization is a term used to describe strategies developed for optimizing single-objective problems by multi-objective algorithms. This paper focuses on the multi-objectivization of the sum-of-the-parts Combinatorial Optimization Problems (COPs), which include the Traveling Salesman Problem (TSP), the Unconstrained Binary Quadratic Programming (UBQP) and other well-known COPs. For a sum-of-the-parts COP, we propose to decompose its original objective into two sub-objectives with controllable correlation. Based on the decomposition method, two new multi-objectivization techniques called Non-Dominance Search (NDS) and Non-Dominance Exploitation (NDE) are developed, respectively. NDS is combined with the Iterated Local Search (ILS) metaheuristic (with fixed neighborhood structure), while NDE is embedded within the Iterated Lin-Kernighan (ILK) metaheuristic (with varied neighborhood structure). The resultant metaheuristics are called ILS+NDS and ILK+NDE, respectively. Empirical studies on some TSP and UBQP instances show that with appropriate correlation between the sub-objectives, there are more chances to escape from local optima when new starting solution is selected from the non-dominated solutions defined by the decomposed sub-objectives. Experimental results also show that ILS+NDS and ILK+NDE both significantly outperform their counterparts on most of the test instances.

Via

Access Paper or Ask Questions

From Adaptive Kernel Density Estimation to Sparse Mixture Models

Dec 11, 2018

Colas Schretter, Jianyong Sun, Peter Schelkens

Figure 1 for From Adaptive Kernel Density Estimation to Sparse Mixture Models

Abstract:We introduce a balloon estimator in a generalized expectation-maximization method for estimating all parameters of a Gaussian mixture model given one data sample per mixture component. Instead of limiting explicitly the model size, this regularization strategy yields low-complexity sparse models where the number of effective mixture components reduces with an increase of a smoothing probability parameter $\mathbf{P>0}$. This semi-parametric method bridges from non-parametric adaptive kernel density estimation (KDE) to parametric ordinary least-squares when $\mathbf{P=1}$. Experiments show that simpler sparse mixture models retain the level of details present in the adaptive KDE solution.

* in Proceedings of iTWIST'18, Paper-ID: 20, Marseille, France, November, 21-23, 2018

Via

Access Paper or Ask Questions