Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Djordje Grbic

Navigating Demand Uncertainty in Container Shipping: Deep Reinforcement Learning for Enabling Adaptive and Feasible Master Stowage Planning

Feb 19, 2025

Jaike van Twiller, Yossiri Adulyasak, Erick Delage, Djordje Grbic, Rune Møller Jensen

Abstract:Reinforcement learning (RL) has shown promise in solving various combinatorial optimization problems. However, conventional RL faces challenges when dealing with real-world constraints, especially when action space feasibility is explicit and dependent on the corresponding state or trajectory. In this work, we focus on using RL in container shipping, often considered the cornerstone of global trade, by dealing with the critical challenge of master stowage planning. The main objective is to maximize cargo revenue and minimize operational costs while navigating demand uncertainty and various complex operational constraints, namely vessel capacity and stability, which must be dynamically updated along the vessel's voyage. To address this problem, we implement a deep reinforcement learning framework with feasibility projection to solve the master stowage planning problem (MPP) under demand uncertainty. The experimental results show that our architecture efficiently finds adaptive, feasible solutions for this multi-stage stochastic optimization problem, outperforming traditional mixed-integer programming and RL with feasibility regularization. Our AI-driven decision-support policy enables adaptive and feasible planning under uncertainty, optimizing operational efficiency and capacity utilization while contributing to sustainable and resilient global supply chains.

* This paper is currently under review for IJCAI 2025

Via

Access Paper or Ask Questions

Procedurally generating rules to adapt difficulty for narrative puzzle games

Jul 07, 2023

Thomas Volden, Djordje Grbic, Paolo Burelli

Abstract:This paper focuses on procedurally generating rules and communicating them to players to adjust the difficulty. This is part of a larger project to collect and adapt games in educational games for young children using a digital puzzle game designed for kindergarten. A genetic algorithm is used together with a difficulty measure to find a target number of solution sets and a large language model is used to communicate the rules in a narrative context. During testing the approach was able to find rules that approximate any given target difficulty within two dozen generations on average. The approach was combined with a large language model to create a narrative puzzle game where players have to host a dinner for animals that can't get along. Future experiments will try to improve evaluation, specialize the language model on children's literature, and collect multi-modal data from players to guide adaptation.

Via

Access Paper or Ask Questions

Safer Reinforcement Learning through Transferable Instinct Networks

Jul 14, 2021

Djordje Grbic, Sebastian Risi

Figure 1 for Safer Reinforcement Learning through Transferable Instinct Networks

Figure 2 for Safer Reinforcement Learning through Transferable Instinct Networks

Figure 3 for Safer Reinforcement Learning through Transferable Instinct Networks

Figure 4 for Safer Reinforcement Learning through Transferable Instinct Networks

Abstract:Random exploration is one of the main mechanisms through which reinforcement learning (RL) finds well-performing policies. However, it can lead to undesirable or catastrophic outcomes when learning online in safety-critical environments. In fact, safe learning is one of the major obstacles towards real-world agents that can learn during deployment. One way of ensuring that agents respect hard limitations is to explicitly configure boundaries in which they can operate. While this might work in some cases, we do not always have clear a-priori information which states and actions can lead dangerously close to hazardous states. Here, we present an approach where an additional policy can override the main policy and offer a safer alternative action. In our instinct-regulated RL (IR^2L) approach, an "instinctual" network is trained to recognize undesirable situations, while guarding the learning policy against entering them. The instinct network is pre-trained on a single task where it is safe to make mistakes, and transferred to environments in which learning a new task safely is critical. We demonstrate IR^2L in the OpenAI Safety gym domain, in which it receives a significantly lower number of safety violations during training than a baseline RL approach while reaching similar task performance.

* The paper was accepted in the ALIFE 2021 conference

Via

Access Paper or Ask Questions

Growing 3D Artefacts and Functional Machines with Neural Cellular Automata

Mar 15, 2021

Shyam Sudhakaran, Djordje Grbic, Siyan Li, Adam Katona, Elias Najarro, Claire Glanois, Sebastian Risi

Figure 1 for Growing 3D Artefacts and Functional Machines with Neural Cellular Automata

Figure 2 for Growing 3D Artefacts and Functional Machines with Neural Cellular Automata

Figure 3 for Growing 3D Artefacts and Functional Machines with Neural Cellular Automata

Figure 4 for Growing 3D Artefacts and Functional Machines with Neural Cellular Automata

Abstract:Neural Cellular Automata (NCAs) have been proven effective in simulating morphogenetic processes, the continuous construction of complex structures from very few starting cells. Recent developments in NCAs lie in the 2D domain, namely reconstructing target images from a single pixel or infinitely growing 2D textures. In this work, we propose an extension of NCAs to 3D, utilizing 3D convolutions in the proposed neural network architecture. Minecraft is selected as the environment for our automaton since it allows the generation of both static structures and moving machines. We show that despite their simplicity, NCAs are capable of growing complex entities such as castles, apartment blocks, and trees, some of which are composed of over 3,000 blocks. Additionally, when trained for regeneration, the system is able to regrow parts of simple functional machines, significantly expanding the capabilities of simulated morphogenetic systems.

Via

Access Paper or Ask Questions

EvoCraft: A New Challenge for Open-Endedness

Dec 08, 2020

Djordje Grbic, Rasmus Berg Palm, Elias Najarro, Claire Glanois, Sebastian Risi

Figure 1 for EvoCraft: A New Challenge for Open-Endedness

Figure 2 for EvoCraft: A New Challenge for Open-Endedness

Figure 3 for EvoCraft: A New Challenge for Open-Endedness

Figure 4 for EvoCraft: A New Challenge for Open-Endedness

Abstract:This paper introduces EvoCraft, a framework for Minecraft designed to study open-ended algorithms. We introduce an API that provides an open-source Python interface for communicating with Minecraft to place and track blocks. In contrast to previous work in Minecraft that focused on learning to play the game, the grand challenge we pose here is to automatically search for increasingly complex artifacts in an open-ended fashion. Compared to other environments used to study open-endedness, Minecraft allows the construction of almost any kind of structure, including actuated machines with circuits and mechanical components. We present initial baseline results in evolving simple Minecraft creations through both interactive and automated evolution. While evolution succeeds when tasked to grow a structure towards a specific target, it is unable to find a solution when rewarded for creating a simple machine that moves. Thus, EvoCraft offers a challenging new environment for automated search methods (such as evolution) to find complex artifacts that we hope will spur the development of more open-ended algorithms. A Python implementation of the EvoCraft framework is available at: https://github.com/real-itu/Evocraft-py.

Via

Access Paper or Ask Questions

Safe Reinforcement Learning through Meta-learned Instincts

May 06, 2020

Djordje Grbic, Sebastian Risi

Figure 1 for Safe Reinforcement Learning through Meta-learned Instincts

Figure 2 for Safe Reinforcement Learning through Meta-learned Instincts

Figure 3 for Safe Reinforcement Learning through Meta-learned Instincts

Figure 4 for Safe Reinforcement Learning through Meta-learned Instincts

Abstract:An important goal in reinforcement learning is to create agents that can quickly adapt to new goals while avoiding situations that might cause damage to themselves or their environments. One way agents learn is through exploration mechanisms, which are needed to discover new policies. However, in deep reinforcement learning, exploration is normally done by injecting noise in the action space. While performing well in many domains, this setup has the inherent risk that the noisy actions performed by the agent lead to unsafe states in the environment. Here we introduce a novel approach called Meta-Learned Instinctual Networks (MLIN) that allows agents to safely learn during their lifetime while avoiding potentially hazardous states. At the core of the approach is a plastic network trained through reinforcement learning and an evolved "instinctual" network, which does not change during the agent's lifetime but can modulate the noisy output of the plastic network. We test our idea on a simple 2D navigation task with no-go zones, in which the agent has to learn to approach new targets during deployment. MLIN outperforms standard meta-trained networks and allows agents to learn to navigate to new targets without colliding with any of the no-go zones. These results suggest that meta-learning augmented with an instinctual network is a promising new approach for safe AI, which may enable progress in this area on a variety of different domains.

Via

Access Paper or Ask Questions