Abstract:Order picking is a crucial operation in warehouses that significantly impacts overall efficiency and profitability. This study addresses the dynamic order picking problem, a significant concern in modern warehouse management where real-time adaptation to fluctuating order arrivals and efficient picker routing are crucial. Traditional methods, often assuming fixed order sets, fall short in this dynamic environment. We utilize Deep Reinforcement Learning (DRL) as a solution methodology to handle the inherent uncertainties in customer demands. We focus on a single-block warehouse with an autonomous picking device, eliminating human behavioral factors. Our DRL framework enables the dynamic optimization of picker routes, significantly reducing order throughput times, especially under high order arrival rates. Experiments demonstrate a substantial decrease in order throughput time and unfulfilled orders compared to benchmark algorithms. We further investigate integrating a hyperparameter in the reward function that allows for flexible balancing between distance traveled and order completion time. Finally, we demonstrate the robustness of our DRL model for out-of-sample test instances.
Abstract:Order Picker Routing is a critical issue in Warehouse Operations Management. Due to the complexity of the problem and the need for quick solutions, suboptimal algorithms are frequently employed in practice. However, Reinforcement Learning offers an appealing alternative to traditional heuristics, potentially outperforming existing methods in terms of speed and accuracy. We introduce an attention based neural network for modeling picker tours, which is trained using Reinforcement Learning. Our method is evaluated against existing heuristics across a range of problem parameters to demonstrate its efficacy. A key advantage of our proposed method is its ability to offer an option to reduce the perceived complexity of routes.
Abstract:This paper proposes a hybrid genetic algorithm for solving the Multiple Traveling Salesman Problem (mTSP) to minimize the length of the longest tour. The genetic algorithm utilizes a TSP sequence as the representation of each individual, and a dynamic programming algorithm is employed to evaluate the individual and find the optimal mTSP solution for the given sequence of cities. A novel crossover operator is designed to combine similar tours from two parents and offers great diversity for the population. For some of the generated offspring, we detect and remove intersections between tours to obtain a solution with no intersections. This is particularly useful for the min-max mTSP. The generated offspring are also improved by a self-adaptive random local search and a thorough neighborhood search. Our algorithm outperforms all existing algorithms on average, with similar cutoff time thresholds, when tested against multiple benchmark sets found in the literature. Additionally, we improve the best-known solutions for 21 out of 89 instances on four benchmark sets.
Abstract:There are emerging transportation problems known as the Traveling Salesman Problem with Drone (TSPD) and the Flying Sidekick Traveling Salesman Problem (FSTSP) that involve the use of a drone in conjunction with a truck for package delivery. This study represents a hybrid genetic algorithm for solving TSPD and FSTSP by combining local search methods and dynamic programming. Similar algorithms exist in the literature. Our algorithm, however, considers more sophisticated chromosomes and simpler dynamic programming to enable broader exploration by the genetic algorithm and efficient exploitation through dynamic programming and local searches. The key contribution of this paper is the discovery of how decision-making processes should be divided among the layers of genetic algorithm, dynamic programming, and local search. In particular, our genetic algorithm generates the truck and the drone sequences separately and encodes them in a type-aware chromosome, wherein each customer is assigned to either the truck or the drone. We apply local searches to each chromosome, which is decoded by dynamic programming for fitness evaluation. Our dynamic programming algorithm merges the two sequences by determining optimal launch and landing locations for the drone to construct a TSPD solution represented by the chromosome. We propose novel type-aware order crossover operations and effective local search methods. A strategy to escape from local optima is proposed. Our new algorithm is shown to outperform existing algorithms on most benchmark instances in both quality and time. Our algorithms found the new best solutions for 538 TSPD instances out of 920 and 93 FSTSP instances out of 132.