Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingfeng Fan

Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation

Jun 10, 2025

Mingfeng Fan, Jianan Zhou, Yifeng Zhang, Yaoxin Wu, Jinbiao Chen, Guillaume Adrien Sartoretti

Abstract:Recent deep reinforcement learning methods have achieved remarkable success in solving multi-objective combinatorial optimization problems (MOCOPs) by decomposing them into multiple subproblems, each associated with a specific weight vector. However, these methods typically treat all subproblems equally and solve them using a single model, hindering the effective exploration of the solution space and thus leading to suboptimal performance. To overcome the limitation, we propose POCCO, a novel plug-and-play framework that enables adaptive selection of model structures for subproblems, which are subsequently optimized based on preference signals rather than explicit reward values. Specifically, we design a conditional computation block that routes subproblems to specialized neural architectures. Moreover, we propose a preference-driven optimization algorithm that learns pairwise preferences between winning and losing solutions. We evaluate the efficacy and versatility of POCCO by applying it to two state-of-the-art neural methods for MOCOPs. Experimental results across four classic MOCOP benchmarks demonstrate its significant superiority and strong generalization.

* 22 pages, 6 figures, under review

Via

Access Paper or Ask Questions

Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control

Mar 14, 2025

Yifeng Zhang, Yilin Liu, Ping Gong, Peizhuo Li, Mingfeng Fan, Guillaume Sartoretti

Abstract:Adaptive traffic signal control (ATSC) is crucial in reducing congestion, maximizing throughput, and improving mobility in rapidly growing urban areas. Recent advancements in parameter-sharing multi-agent reinforcement learning (MARL) have greatly enhanced the scalable and adaptive optimization of complex, dynamic flows in large-scale homogeneous networks. However, the inherent heterogeneity of real-world traffic networks, with their varied intersection topologies and interaction dynamics, poses substantial challenges to achieving scalable and effective ATSC across different traffic scenarios. To address these challenges, we present Unicorn, a universal and collaborative MARL framework designed for efficient and adaptable network-wide ATSC. Specifically, we first propose a unified approach to map the states and actions of intersections with varying topologies into a common structure based on traffic movements. Next, we design a Universal Traffic Representation (UTR) module with a decoder-only network for general feature extraction, enhancing the model's adaptability to diverse traffic scenarios. Additionally, we incorporate an Intersection Specifics Representation (ISR) module, designed to identify key latent vectors that represent the unique intersection's topology and traffic dynamics through variational inference techniques. To further refine these latent representations, we employ a contrastive learning approach in a self-supervised manner, which enables better differentiation of intersection-specific features. Moreover, we integrate the state-action dependencies of neighboring agents into policy optimization, which effectively captures dynamic agent interactions and facilitates efficient regional collaboration. Our results show that Unicorn outperforms other methods across various evaluation metrics, highlighting its potential in complex, dynamic traffic networks.

Via

Access Paper or Ask Questions

DL-DRL: A double-layer deep reinforcement learning approach for large-scale task scheduling of multi-UAV

Aug 04, 2022

Xiao Mao, Guohua Wu, Mingfeng Fan

Figure 1 for DL-DRL: A double-layer deep reinforcement learning approach for large-scale task scheduling of multi-UAV

Figure 2 for DL-DRL: A double-layer deep reinforcement learning approach for large-scale task scheduling of multi-UAV

Figure 3 for DL-DRL: A double-layer deep reinforcement learning approach for large-scale task scheduling of multi-UAV

Figure 4 for DL-DRL: A double-layer deep reinforcement learning approach for large-scale task scheduling of multi-UAV

Abstract:This paper studies deep reinforcement learning (DRL) for the task scheduling problem of multiple unmanned aerial vehicles (UAVs). Current approaches generally use exact and heuristic algorithms to solve the problem, while the computation time rapidly increases as the task scale grows and heuristic rules need manual design. As a self-learning method, DRL can obtain a high-quality solution quickly without hand-engineered rules. However, the huge decision space makes the training of DRL models becomes unstable in situations with large-scale tasks. In this work, to address the large-scale problem, we develop a divide and conquer-based framework (DCF) to decouple the original problem into a task allocation and a UAV route planning subproblems, which are solved in the upper and lower layers, respectively. Based on DCF, a double-layer deep reinforcement learning approach (DL-DRL) is proposed, where an upper-layer DRL model is designed to allocate tasks to appropriate UAVs and a lower-layer DRL model [i.e., the widely used attention model (AM)] is applied to generate viable UAV routes. Since the upper-layer model determines the input data distribution of the lower-layer model, and its reward is calculated via the lower-layer model during training, we develop an interactive training strategy (ITS), where the whole training process consists of pre-training, intensive training, and alternate training processes. Experimental results show that our DL-DRL outperforms mainstream learning-based and most traditional methods, and is competitive with the state-of-the-art heuristic method [i.e., OR-Tools], especially on large-scale problems. The great generalizability of DL-DRL is also verified by testing the model learned for a problem size to larger ones. Furthermore, an ablation study demonstrates that our ITS can reach a compromise between the model performance and training duration.

Via

Access Paper or Ask Questions

An Overview and Experimental Study of Learning-based Optimization Algorithms for Vehicle Routing Problem

Jul 15, 2021

Bingjie Li, Guohua Wu, Yongming He, Mingfeng Fan, Witold Pedrycz

Figure 1 for An Overview and Experimental Study of Learning-based Optimization Algorithms for Vehicle Routing Problem

Figure 2 for An Overview and Experimental Study of Learning-based Optimization Algorithms for Vehicle Routing Problem

Figure 3 for An Overview and Experimental Study of Learning-based Optimization Algorithms for Vehicle Routing Problem

Figure 4 for An Overview and Experimental Study of Learning-based Optimization Algorithms for Vehicle Routing Problem

Abstract:Vehicle routing problem (VRP) is a typical discrete combinatorial optimization problem, and many models and algorithms have been proposed to solve VRP and variants. Although existing approaches has contributed a lot to the development of this field, these approaches either are limited in problem size or need manual intervening in choosing parameters. To tackle these difficulties, many studies consider learning-based optimization algorithms to solve VRP. This paper reviews recent advances in this field and divides relevant approaches into end-to-end approaches and step-by-step approaches. We design three part experiments to justly evaluate performance of four representative learning-based optimization algorithms and conclude that combining heuristic search can effectively improve learning ability and sampled efficiency of LBO models. Finally we point out that research trend of LBO algorithms is to solve large-scale and multiple constraints problems from real world.

* 18 pages, 11 figures

Via

Access Paper or Ask Questions

An Autonomous Path Planning Method for Unmanned Aerial Vehicle based on A Tangent Intersection and Target Guidance Strategy

Jun 07, 2020

Huan Liu, Xiamiao Li, Mingfeng Fan, Guohua Wu, Witold Pedrycz, Ponnuthurai Nagaratnam Suganthan

Figure 1 for An Autonomous Path Planning Method for Unmanned Aerial Vehicle based on A Tangent Intersection and Target Guidance Strategy

Figure 2 for An Autonomous Path Planning Method for Unmanned Aerial Vehicle based on A Tangent Intersection and Target Guidance Strategy

Figure 3 for An Autonomous Path Planning Method for Unmanned Aerial Vehicle based on A Tangent Intersection and Target Guidance Strategy

Figure 4 for An Autonomous Path Planning Method for Unmanned Aerial Vehicle based on A Tangent Intersection and Target Guidance Strategy

Abstract:Unmanned aerial vehicle (UAV) path planning enables UAVs to avoid obstacles and reach the target efficiently. To generate high-quality paths without obstacle collision for UAVs, this paper proposes a novel autonomous path planning algorithm based on a tangent intersection and target guidance strategy (APPATT). Guided by a target, the elliptic tangent graph method is used to generate two sub-paths, one of which is selected based on heuristic rules when confronting an obstacle. The UAV flies along the selected sub-path and repeatedly adjusts its flight path to avoid obstacles through this way until the collision-free path extends to the target. Considering the UAV kinematic constraints, the cubic B-spline curve is employed to smooth the waypoints for obtaining a feasible path. Compared with A*, PRM, RRT and VFH, the experimental results show that APPATT can generate the shortest collision-free path within 0.05 seconds for each instance under static environments. Moreover, compared with VFH and RRTRW, APPATT can generate satisfactory collision-free paths under uncertain environments in a nearly real-time manner. It is worth noting that APPATT has the capability of escaping from simple traps within a reasonable time.

Via

Access Paper or Ask Questions