Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zizhen Zhang

Neural Combinatorial Optimization via Preference Optimization

Mar 10, 2025

Zijun Liao, Jinbiao Chen, Debing Wang, Zizhen Zhang, Jiahai Wang

Abstract:Neural Combinatorial Optimization (NCO) has emerged as a promising approach for NP-hard problems. However, prevailing RL-based methods suffer from low sample efficiency due to sparse rewards and underused solutions. We propose Preference Optimization for Combinatorial Optimization (POCO), a training paradigm that leverages solution preferences via objective values. It introduces: (1) an efficient preference pair construction for better explore and exploit solutions, and (2) a novel loss function that adaptively scales gradients via objective differences, removing reliance on reward models or reference policies. Experiments on Job-Shop Scheduling (JSP), Traveling Salesman (TSP), and Flexible Job-Shop Scheduling (FJSP) show POCO outperforms state-of-the-art neural methods, reducing optimality gaps impressively with efficient inference. POCO is architecture-agnostic, enabling seamless integration with existing NCO models, and establishes preference optimization as a principled framework for combinatorial optimization.

Via

Access Paper or Ask Questions

Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization

Oct 22, 2023

Jinbiao Chen, Jiahai Wang, Zizhen Zhang, Zhiguang Cao, Te Ye, Siyuan Chen

Figure 1 for Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization

Figure 2 for Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization

Figure 3 for Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization

Figure 4 for Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization

Abstract:Recently, neural heuristics based on deep reinforcement learning have exhibited promise in solving multi-objective combinatorial optimization problems (MOCOPs). However, they are still struggling to achieve high learning efficiency and solution quality. To tackle this issue, we propose an efficient meta neural heuristic (EMNH), in which a meta-model is first trained and then fine-tuned with a few steps to solve corresponding single-objective subproblems. Specifically, for the training process, a (partial) architecture-shared multi-task model is leveraged to achieve parallel learning for the meta-model, so as to speed up the training; meanwhile, a scaled symmetric sampling method with respect to the weight vectors is designed to stabilize the training. For the fine-tuning process, an efficient hierarchical method is proposed to systematically tackle all the subproblems. Experimental results on the multi-objective traveling salesman problem (MOTSP), multi-objective capacitated vehicle routing problem (MOCVRP), and multi-objective knapsack problem (MOKP) show that, EMNH is able to outperform the state-of-the-art neural heuristics in terms of solution quality and learning efficiency, and yield competitive solutions to the strong traditional heuristics while consuming much shorter time.

* Accepted at NeurIPS 2023

Via

Access Paper or Ask Questions

Neural Multi-Objective Combinatorial Optimization with Diversity Enhancement

Oct 22, 2023

Jinbiao Chen, Zizhen Zhang, Zhiguang Cao, Yaoxin Wu, Yining Ma, Te Ye, Jiahai Wang

Abstract:Most of existing neural methods for multi-objective combinatorial optimization (MOCO) problems solely rely on decomposition, which often leads to repetitive solutions for the respective subproblems, thus a limited Pareto set. Beyond decomposition, we propose a novel neural heuristic with diversity enhancement (NHDE) to produce more Pareto solutions from two perspectives. On the one hand, to hinder duplicated solutions for different subproblems, we propose an indicator-enhanced deep reinforcement learning method to guide the model, and design a heterogeneous graph attention mechanism to capture the relations between the instance graph and the Pareto front graph. On the other hand, to excavate more solutions in the neighborhood of each subproblem, we present a multiple Pareto optima strategy to sample and preserve desirable solutions. Experimental results on classic MOCO problems show that our NHDE is able to generate a Pareto front with higher diversity, thereby achieving superior overall performance. Moreover, our NHDE is generic and can be applied to different neural methods for MOCO.

* Accepted at NeurIPS 2023

Via

Access Paper or Ask Questions

MODRL/D-EL: Multiobjective Deep Reinforcement Learning with Evolutionary Learning for Multiobjective Optimization

Jul 16, 2021

Yongxin Zhang, Jiahai Wang, Zizhen Zhang, Yalan Zhou

Figure 1 for MODRL/D-EL: Multiobjective Deep Reinforcement Learning with Evolutionary Learning for Multiobjective Optimization

Figure 2 for MODRL/D-EL: Multiobjective Deep Reinforcement Learning with Evolutionary Learning for Multiobjective Optimization

Figure 3 for MODRL/D-EL: Multiobjective Deep Reinforcement Learning with Evolutionary Learning for Multiobjective Optimization

Figure 4 for MODRL/D-EL: Multiobjective Deep Reinforcement Learning with Evolutionary Learning for Multiobjective Optimization

Abstract:Learning-based heuristics for solving combinatorial optimization problems has recently attracted much academic attention. While most of the existing works only consider the single objective problem with simple constraints, many real-world problems have the multiobjective perspective and contain a rich set of constraints. This paper proposes a multiobjective deep reinforcement learning with evolutionary learning algorithm for a typical complex problem called the multiobjective vehicle routing problem with time windows (MO-VRPTW). In the proposed algorithm, the decomposition strategy is applied to generate subproblems for a set of attention models. The comprehensive context information is introduced to further enhance the attention models. The evolutionary learning is also employed to fine-tune the parameters of the models. The experimental results on MO-VRPTW instances demonstrate the superiority of the proposed algorithm over other learning-based and iterative-based approaches.

Via

Access Paper or Ask Questions

Meta-Learning-based Deep Reinforcement Learning for Multiobjective Optimization Problems

May 06, 2021

Zizhen Zhang, Zhiyuan Wu, Jiahai Wang

Figure 1 for Meta-Learning-based Deep Reinforcement Learning for Multiobjective Optimization Problems

Figure 2 for Meta-Learning-based Deep Reinforcement Learning for Multiobjective Optimization Problems

Figure 3 for Meta-Learning-based Deep Reinforcement Learning for Multiobjective Optimization Problems

Figure 4 for Meta-Learning-based Deep Reinforcement Learning for Multiobjective Optimization Problems

Abstract:Deep reinforcement learning (DRL) has recently shown its success in tackling complex combinatorial optimization problems. When these problems are extended to multiobjective ones, it becomes difficult for the existing DRL approaches to flexibly and efficiently deal with multiple subproblems determined by weight decomposition of objectives. This paper proposes a concise meta-learning-based DRL approach. It first trains a meta-model by meta-learning. The meta-model is fine-tuned with a few update steps to derive submodels for the corresponding subproblems. The Pareto front is built accordingly. The computational experiments on multiobjective traveling salesman problems demonstrate the superiority of our method over most of learning-based and iteration-based approaches.

Via

Access Paper or Ask Questions

MODRL/D-AM: Multiobjective Deep Reinforcement Learning Algorithm Using Decomposition and Attention Model for Multiobjective Optimization

Feb 13, 2020

Hong Wu, Jiahai Wang, Zizhen Zhang

Figure 1 for MODRL/D-AM: Multiobjective Deep Reinforcement Learning Algorithm Using Decomposition and Attention Model for Multiobjective Optimization

Figure 2 for MODRL/D-AM: Multiobjective Deep Reinforcement Learning Algorithm Using Decomposition and Attention Model for Multiobjective Optimization

Figure 3 for MODRL/D-AM: Multiobjective Deep Reinforcement Learning Algorithm Using Decomposition and Attention Model for Multiobjective Optimization

Figure 4 for MODRL/D-AM: Multiobjective Deep Reinforcement Learning Algorithm Using Decomposition and Attention Model for Multiobjective Optimization

Abstract:Recently, a deep reinforcement learning method is proposed to solve multiobjective optimization problem. In this method, the multiobjective optimization problem is decomposed to a number of single-objective optimization subproblems and all the subproblems are optimized in a collaborative manner. Each subproblem is modeled with a pointer network and the model is trained with reinforcement learning. However, when pointer network extracts the features of an instance, it ignores the underlying structure information of the input nodes. Thus, this paper proposes a multiobjective deep reinforcement learning method using decomposition and attention model to solve multiobjective optimization problem. In our method, each subproblem is solved by an attention model, which can exploit the structure features as well as node features of input nodes. The experiment results on multiobjective travelling salesman problem show the proposed algorithm achieves better performance compared with the previous method.

Via

Access Paper or Ask Questions

A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems

Feb 09, 2020

Bo Peng, Jiahai Wang, Zizhen Zhang

Figure 1 for A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems

Figure 2 for A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems

Figure 3 for A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems

Figure 4 for A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems

Abstract:Recent researches show that machine learning has the potential to learn better heuristics than the one designed by human for solving combinatorial optimization problems. The deep neural network is used to characterize the input instance for constructing a feasible solution incrementally. Recently, an attention model is proposed to solve routing problems. In this model, the state of an instance is represented by node features that are fixed over time. However, the fact is, the state of an instance is changed according to the decision that the model made at different construction steps, and the node features should be updated correspondingly. Therefore, this paper presents a dynamic attention model with dynamic encoder-decoder architecture, which enables the model to explore node features dynamically and exploit hidden structure information effectively at different construction steps. This paper focuses on a challenging NP-hard problem, vehicle routing problem. The experiments indicate that our model outperforms the previous methods and also shows a good generalization performance.

* 15 pages, 8 figures

Via

Access Paper or Ask Questions

Collective Mobile Sequential Recommendation: A Recommender System for Multiple Taxicabs

Jun 22, 2019

Tongwen Wu, Zizhen Zhang, Yanzhi Li, Jiahai Wang

Figure 1 for Collective Mobile Sequential Recommendation: A Recommender System for Multiple Taxicabs

Figure 2 for Collective Mobile Sequential Recommendation: A Recommender System for Multiple Taxicabs

Figure 3 for Collective Mobile Sequential Recommendation: A Recommender System for Multiple Taxicabs

Figure 4 for Collective Mobile Sequential Recommendation: A Recommender System for Multiple Taxicabs

Abstract:Mobile sequential recommendation was originally designed to find a promising route for a single taxicab. Directly applying it for multiple taxicabs may cause an excessive overlap of recommended routes. The multi-taxicab recommendation problem is challenging and has been less studied. In this paper, we first formalize a collective mobile sequential recommendation problem based on a classic mathematical model, which characterizes time-varying influence among competing taxicabs. Next, we propose a new evaluation metric for a collection of taxicab routes aimed to minimize the sum of potential travel time. We then develop an efficient algorithm to calculate the metric and design a greedy recommendation method to approximate the solution. Finally, numerical experiments show the superiority of our methods. In trace-driven simulation, the set of routes recommended by our method significantly outperforms those obtained by conventional methods.

Via

Access Paper or Ask Questions

A Tabu Search Algorithm for the Multi-period Inspector Scheduling Problem

Sep 17, 2014

Hu Qin, Zizhen Zhang, Yubin Xie, Andrew Lim

Figure 1 for A Tabu Search Algorithm for the Multi-period Inspector Scheduling Problem

Figure 2 for A Tabu Search Algorithm for the Multi-period Inspector Scheduling Problem

Figure 3 for A Tabu Search Algorithm for the Multi-period Inspector Scheduling Problem

Figure 4 for A Tabu Search Algorithm for the Multi-period Inspector Scheduling Problem

Abstract:This paper introduces a multi-period inspector scheduling problem (MPISP), which is a new variant of the multi-trip vehicle routing problem with time windows (VRPTW). In the MPISP, each inspector is scheduled to perform a route in a given multi-period planning horizon. At the end of each period, each inspector is not required to return to the depot but has to stay at one of the vertices for recuperation. If the remaining time of the current period is insufficient for an inspector to travel from his/her current vertex $A$ to a certain vertex B, he/she can choose either waiting at vertex A until the start of the next period or traveling to a vertex C that is closer to vertex B. Therefore, the shortest transit time between any vertex pair is affected by the length of the period and the departure time. We first describe an approach of computing the shortest transit time between any pair of vertices with an arbitrary departure time. To solve the MPISP, we then propose several local search operators adapted from classical operators for the VRPTW and integrate them into a tabu search framework. In addition, we present a constrained knapsack model that is able to produce an upper bound for the problem. Finally, we evaluate the effectiveness of our algorithm with extensive experiments based on a set of test instances. Our computational results indicate that our approach generates high-quality solutions.

Via

Access Paper or Ask Questions

An Enhanced Branch-and-bound Algorithm for the Talent Scheduling Problem

Jan 23, 2014

Zizhen Zhang, Hu Qin, Xiaocong Liang, Andrew Lim

Figure 1 for An Enhanced Branch-and-bound Algorithm for the Talent Scheduling Problem

Figure 2 for An Enhanced Branch-and-bound Algorithm for the Talent Scheduling Problem

Figure 3 for An Enhanced Branch-and-bound Algorithm for the Talent Scheduling Problem

Figure 4 for An Enhanced Branch-and-bound Algorithm for the Talent Scheduling Problem

Abstract:The talent scheduling problem is a simplified version of the real-world film shooting problem, which aims to determine a shooting sequence so as to minimize the total cost of the actors involved. In this article, we first formulate the problem as an integer linear programming model. Next, we devise a branch-and-bound algorithm to solve the problem. The branch-and-bound algorithm is enhanced by several accelerating techniques, including preprocessing, dominance rules and caching search states. Extensive experiments over two sets of benchmark instances suggest that our algorithm is superior to the current best exact algorithm. Finally, the impacts of different parameter settings are disclosed by some additional experiments.

Via

Access Paper or Ask Questions