Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tobias Glasmachers

A Superlinearly Convergent Evolution Strategy

May 16, 2025

Tobias Glasmachers

Abstract:We present a hybrid algorithm between an evolution strategy and a quasi Newton method. The design is based on the Hessian Estimation Evolution Strategy, which iteratively estimates the inverse square root of the Hessian matrix of the problem. This is akin to a quasi-Newton method and corresponding derivative-free trust-region algorithms like NEWUOA. The proposed method therefore replaces the global recombination step commonly found in non-elitist evolution strategies with a quasi-Newton step. Numerical results show superlinear convergence, resulting in improved performance in particular on smooth convex problems.

Via

Access Paper or Ask Questions

Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps

Apr 25, 2025

Simon Hakenes, Tobias Glasmachers

Abstract:This paper addresses the challenge of navigation in large, visually complex environments with sparse rewards. We propose a method that uses object-oriented macro actions grounded in a topological map, allowing a simple Deep Q-Network (DQN) to learn effective navigation policies. The agent builds a map by detecting objects from RGBD input and selecting discrete macro actions that correspond to navigating to these objects. This abstraction drastically reduces the complexity of the underlying reinforcement learning problem and enables generalization to unseen environments. We evaluate our approach in a photorealistic 3D simulation and show that it significantly outperforms a random baseline under both immediate and terminal reward conditions. Our results demonstrate that topological structure and macro-level abstraction can enable sample-efficient learning even from pixel data.

* 14 pages, 6 figures

Via

Access Paper or Ask Questions

Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem

Mar 21, 2025

Abhijeet Pendyala, Tobias Glasmachers

Abstract:In this work, we augment reinforcement learning with an inference-time collision model to ensure safe and efficient container management in a waste-sorting facility with limited processing capacity. Each container has two optimal emptying volumes that trade off higher throughput against overflow risk. Conventional reinforcement learning (RL) approaches struggle under delayed rewards, sparse critical events, and high-dimensional uncertainty -- failing to consistently balance higher-volume empties with the risk of safety-limit violations. To address these challenges, we propose a hybrid method comprising: (1) a curriculum-learning pipeline that incrementally trains a PPO agent to handle delayed rewards and class imbalance, and (2) an offline pairwise collision model used at inference time to proactively avert collisions with minimal online cost. Experimental results show that our targeted inference-time collision checks significantly improve collision avoidance, reduce safety-limit violations, maintain high throughput, and scale effectively across varying container-to-PU ratios. These findings offer actionable guidelines for designing safe and efficient container-management systems in real-world facilities.

Via

Access Paper or Ask Questions

SortingEnv: An Extendable RL-Environment for an Industrial Sorting Process

Mar 13, 2025

Tom Maus, Nico Zengeler, Tobias Glasmachers

Abstract:We present a novel reinforcement learning (RL) environment designed to both optimize industrial sorting systems and study agent behavior in evolving spaces. In simulating material flow within a sorting process our environment follows the idea of a digital twin, with operational parameters like belt speed and occupancy level. To reflect real-world challenges, we integrate common upgrades to industrial setups, like new sensors or advanced machinery. It thus includes two variants: a basic version focusing on discrete belt speed adjustments and an advanced version introducing multiple sorting modes and enhanced material composition observations. We detail the observation spaces, state update mechanisms, and reward functions for both environments. We further evaluate the efficiency of common RL algorithms like Proximal Policy Optimization (PPO), Deep-Q-Networks (DQN), and Advantage Actor Critic (A2C) in comparison to a classical rule-based agent (RBA). This framework not only aids in optimizing industrial processes but also provides a foundation for studying agent behavior and transferability in evolving environments, offering insights into model performance and practical implications for real-world RL applications.

* Presented at the 12th International Conference on Industrial Engineering and Applications (ICIEA-EU), Munich, 2025. This article has been submitted to AIP Conference Proceedings. After it is published, it will be available in the AIP Digital Library

Via

Access Paper or Ask Questions

Variable Metric Evolution Strategies for High-dimensional Multi-Objective Optimization

Dec 20, 2024

Tobias Glasmachers

Abstract:We design a class of variable metric evolution strategies well suited for high-dimensional problems. We target problems with many variables, not (necessarily) with many objectives. The construction combines two independent developments: efficient algorithms for scaling covariance matrix adaptation to high dimensions, and evolution strategies for multi-objective optimization. In order to design a specific instance of the class we first develop a (1+1) version of the limited memory matrix adaptation evolution strategy and then use an established standard construction to turn a population thereof into a state-of-the-art multi-objective optimizer with indicator-based selection. The method compares favorably to adaptation of the full covariance matrix.

Via

Access Paper or Ask Questions

Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering

Apr 03, 2024

Abhijeet Pendyala, Asma Atamna, Tobias Glasmachers

Abstract:We present a proximal policy optimization (PPO) agent trained through curriculum learning (CL) principles and meticulous reward engineering to optimize a real-world high-throughput waste sorting facility. Our work addresses the challenge of effectively balancing the competing objectives of operational safety, volume optimization, and minimizing resource usage. A vanilla agent trained from scratch on these multiple criteria fails to solve the problem due to its inherent complexities. This problem is particularly difficult due to the environment's extremely delayed rewards with long time horizons and class (or action) imbalance, with important actions being infrequent in the optimal policy. This forces the agent to anticipate long-term action consequences and prioritize rare but rewarding behaviours, creating a non-trivial reinforcement learning task. Our five-stage CL approach tackles these challenges by gradually increasing the complexity of the environmental dynamics during policy transfer while simultaneously refining the reward mechanism. This iterative and adaptable process enables the agent to learn a desired optimal policy. Results demonstrate that our approach significantly improves inference-time safety, achieving near-zero safety violations in addition to enhancing waste sorting plant efficiency.

Via

Access Paper or Ask Questions

ProtoP-OD: Explainable Object Detection with Prototypical Parts

Feb 29, 2024

Pavlos Rath-Manakidis, Frederik Strothmann, Tobias Glasmachers, Laurenz Wiskott

Abstract:Interpretation and visualization of the behavior of detection transformers tends to highlight the locations in the image that the model attends to, but it provides limited insight into the \emph{semantics} that the model is focusing on. This paper introduces an extension to detection transformers that constructs prototypical local features and uses them in object detection. These custom features, which we call prototypical parts, are designed to be mutually exclusive and align with the classifications of the model. The proposed extension consists of a bottleneck module, the prototype neck, that computes a discretized representation of prototype activations and a new loss term that matches prototypes to object classes. This setup leads to interpretable representations in the prototype neck, allowing visual inspection of the image content perceived by the model and a better understanding of the model's reliability. We show experimentally that our method incurs only a limited performance penalty, and we provide examples that demonstrate the quality of the explanations provided by our method, which we argue outweighs the performance penalty.

* 9 pages, 11 figures

Via

Access Paper or Ask Questions

Ruhr Hand Motion Catalog of Human Center-Out Transport Trajectories in 3D Task-Space Captured by a Redundant Measurement System

Dec 31, 2023

Tim Sziburis, Susanne Blex, Tobias Glasmachers, Ioannis Iossifidis

Abstract:Neurological conditions are a major source of movement disorders. Motion modelling and variability analysis have the potential to identify pathology but require profound data. We introduce a systematic dataset of 3D center-out task-space trajectories of human hand transport movements in a natural setting. The transport tasks of this study consist of grasping a cylindric object from a unified start position and transporting it to one of nine target locations in unconstrained operational space. The measurement procedure is automatized to record ten trials per target location. With that, the dataset consists of 90 movement trajectories for each hand of 31 participants without known movement disorders. The participants are aged between 21 and 78 years, covering a wide range. Data are recorded redundantly by both an optical tracking system and an IMU sensor. As opposed to the stationary capturing system, the IMU can be considered as a portable, low-cost and energy-efficient alternative to be implemented on embedded systems.

Via

Access Paper or Ask Questions

Leveraging Topological Maps in Deep Reinforcement Learning for Multi-Object Navigation

Oct 16, 2023

Simon Hakenes, Tobias Glasmachers

Abstract:This work addresses the challenge of navigating expansive spaces with sparse rewards through Reinforcement Learning (RL). Using topological maps, we elevate elementary actions to object-oriented macro actions, enabling a simple Deep Q-Network (DQN) agent to solve otherwise practically impossible environments.

* Extended Abstract, Northern Lights Deep Learning Conference 2024, 3 pages, 2 figures

Via

Access Paper or Ask Questions

Understanding Activation Patterns in Artificial Neural Networks by Exploring Stochastic Processes

Aug 01, 2023

Stephan Johann Lehmler, Muhammad Saif-ur-Rehman, Tobias Glasmachers, Ioannis Iossifidis

Abstract:To gain a deeper understanding of the behavior and learning dynamics of (deep) artificial neural networks, it is valuable to employ mathematical abstractions and models. These tools provide a simplified perspective on network performance and facilitate systematic investigations through simulations. In this paper, we propose utilizing the framework of stochastic processes, which has been underutilized thus far. Our approach models activation patterns of thresholded nodes in (deep) artificial neural networks as stochastic processes. We focus solely on activation frequency, leveraging neuroscience techniques used for real neuron spike trains. During a classification task, we extract spiking activity and use an arrival process following the Poisson distribution. We examine observed data from various artificial neural networks in image recognition tasks, fitting the proposed model's assumptions. Through this, we derive parameters describing activation patterns in each network. Our analysis covers randomly initialized, generalizing, and memorizing networks, revealing consistent differences across architectures and training sets. Calculating Mean Firing Rate, Mean Fano Factor, and Variances, we find stable indicators of memorization during learning, providing valuable insights into network behavior. The proposed model shows promise in describing activation patterns and could serve as a general framework for future investigations. It has potential applications in theoretical simulations, pruning, and transfer learning.

Via

Access Paper or Ask Questions