Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anne Meyer

Reinforcement Learning for AMR Charging Decisions: The Impact of Reward and Action Space Design

May 16, 2025

Janik Bischoff, Alexandru Rinciog, Anne Meyer

Abstract:We propose a novel reinforcement learning (RL) design to optimize the charging strategy for autonomous mobile robots in large-scale block stacking warehouses. RL design involves a wide array of choices that can mostly only be evaluated through lengthy experimentation. Our study focuses on how different reward and action space configurations, ranging from flexible setups to more guided, domain-informed design configurations, affect the agent performance. Using heuristic charging strategies as a baseline, we demonstrate the superiority of flexible, RL-based approaches in terms of service times. Furthermore, our findings highlight a trade-off: While more open-ended designs are able to discover well-performing strategies on their own, they may require longer convergence times and are less stable, whereas guided configurations lead to a more stable learning process but display a more limited generalization potential. Our contributions are threefold. First, we extend SLAPStack, an open-source, RL-compatible simulation-framework to accommodate charging strategies. Second, we introduce a novel RL design for tackling the charging strategy problem. Finally, we introduce several novel adaptive baseline heuristics and reproducibly evaluate the design using a Proximal Policy Optimization agent and varying different design configurations, with a focus on reward.

* Under review LION19: The 19th Learning and Intelligent OptimizatioN Conference

Via

Access Paper or Ask Questions

Leveraging Large Language Models to Develop Heuristics for Emerging Optimization Problems

Mar 05, 2025

Thomas Bömer, Nico Koltermann, Max Disselnmeyer, Laura Dörr, Anne Meyer

Abstract:Combinatorial optimization problems often rely on heuristic algorithms to generate efficient solutions. However, the manual design of heuristics is resource-intensive and constrained by the designer's expertise. Recent advances in artificial intelligence, particularly large language models (LLMs), have demonstrated the potential to automate heuristic generation through evolutionary frameworks. Recent works focus only on well-known combinatorial optimization problems like the traveling salesman problem and online bin packing problem when designing constructive heuristics. This study investigates whether LLMs can effectively generate heuristics for niche, not yet broadly researched optimization problems, using the unit-load pre-marshalling problem as an example case. We propose the Contextual Evolution of Heuristics (CEoH) framework, an extension of the Evolution of Heuristics (EoH) framework, which incorporates problem-specific descriptions to enhance in-context learning during heuristic generation. Through computational experiments, we evaluate CEoH and EoH and compare the results. Results indicate that CEoH enables smaller LLMs to generate high-quality heuristics more consistently and even outperform larger models. Larger models demonstrate robust performance with or without contextualized prompts. The generated heuristics exhibit scalability to diverse instance configurations.

* Under review LION19: The 19th Learning and Intelligent OptimizatioN Conference

Via

Access Paper or Ask Questions

Mission planning for emergency rapid mapping with drones

Mar 02, 2022

Katharina Glock, Anne Meyer

Figure 1 for Mission planning for emergency rapid mapping with drones

Figure 2 for Mission planning for emergency rapid mapping with drones

Figure 3 for Mission planning for emergency rapid mapping with drones

Figure 4 for Mission planning for emergency rapid mapping with drones

Abstract:We introduce a mission planning concept for routing unmanned aerial vehicles (UAVs) through a set of sampling locations in the immediate aftermath of an incident such as a fire or chemical accident. Using interpolation methods that account for the spatial interdependencies inherent in the surveyed phenomenon, these samples allow predicting the distribution of hazardous substances across the affected area. We define the generalized correlated team orienteering problem (GCorTOP) for selecting {informative} samples considering spatial correlations between observed and unobserved locations as well as priorities in the surveyed area. To quickly provide high-quality solutions in time-sensitive situations, we propose a two-phase multi-start adaptive large neighborhood search (2MLS). We show the competitiveness of the solution approach using benchmark instances for the team orienteering problem and investigate the performance of the proposed models and solution approach in an extensive study based on newly introduced benchmark instances for the mission planning problem.

* Transportation Science 54(2):534-560 (2020)

Via

Access Paper or Ask Questions

Towards Standardizing Reinforcement Learning Approaches for Stochastic Production Scheduling

Apr 16, 2021

Alexandru Rinciog, Anne Meyer

Figure 1 for Towards Standardizing Reinforcement Learning Approaches for Stochastic Production Scheduling

Figure 2 for Towards Standardizing Reinforcement Learning Approaches for Stochastic Production Scheduling

Figure 3 for Towards Standardizing Reinforcement Learning Approaches for Stochastic Production Scheduling

Figure 4 for Towards Standardizing Reinforcement Learning Approaches for Stochastic Production Scheduling

Abstract:Recent years have seen a rise in interest in terms of using machine learning, particularly reinforcement learning (RL), for production scheduling problems of varying degrees of complexity. The general approach is to break down the scheduling problem into a Markov Decision Process (MDP), whereupon a simulation implementing the MDP is used to train an RL agent. Since existing studies rely on (sometimes) complex simulations for which the code is unavailable, the experiments presented are hard, or, in the case of stochastic environments, impossible to reproduce accurately. Furthermore, there is a vast array of RL designs to choose from. To make RL methods widely applicable in production scheduling and work out their strength for the industry, the standardization of model descriptions - both production setup and RL design - and validation scheme are a prerequisite. Our contribution is threefold: First, we standardize the description of production setups used in RL studies based on established nomenclature. Secondly, we classify RL design choices from existing publications. Lastly, we propose recommendations for a validation scheme focusing on reproducibility and sufficient benchmarking.

Via

Access Paper or Ask Questions

Travel Time Prediction using Tree-Based Ensembles

May 28, 2020

He Huang, Martin Pouls, Anne Meyer, Markus Pauly

Figure 1 for Travel Time Prediction using Tree-Based Ensembles

Figure 2 for Travel Time Prediction using Tree-Based Ensembles

Figure 3 for Travel Time Prediction using Tree-Based Ensembles

Figure 4 for Travel Time Prediction using Tree-Based Ensembles

Abstract:In this paper, we consider the task of predicting travel times between two arbitrary points in an urban scenario. We view this problem from two temporal perspectives: long-term forecasting with a horizon of several days and short-term forecasting with a horizon of one hour. Both of these perspectives are relevant for planning tasks in the context of urban mobility and transportation services. We utilize tree-based ensemble methods that we train and evaluate on a dataset of taxi trip records from New York City. Through extensive data analysis, we identify relevant temporal and spatial features. We also engineer additional features based on weather and routing data. The latter is obtained via a routing solver operating on the road network. The computational results show that the addition of this routing data can be beneficial to the model performance. Moreover, employing different models for short and long-term prediction is useful as short-term models are better suited to mirror current traffic conditions. In fact, we show that accurate short-term predictions may be obtained with only little training data.

Via

Access Paper or Ask Questions