Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Roger Creus Castanyer

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Jun 18, 2025

Roger Creus Castanyer, Johan Obando-Ceron, Lu Li, Pierre-Luc Bacon, Glen Berseth, Aaron Courville, Pablo Samuel Castro

Abstract:Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure mode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a variety of agents and suites of environments.

Via

Access Paper or Ask Questions

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

May 29, 2024

Mingqi Yuan, Roger Creus Castanyer, Bo Li, Xin Jin, Glen Berseth, Wenjun Zeng

Abstract:Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks. However, extrinsic rewards frequently fall short in complex environments due to the significant human effort needed for their design and annotation. This limitation underscores the necessity for intrinsic rewards, which offer auxiliary and dense signals and can enable agents to learn in an unsupervised manner. Although various intrinsic reward formulations have been proposed, their implementation and optimization details are insufficiently explored and lack standardization, thereby hindering research progress. To address this gap, we introduce RLeXplore, a unified, highly modularized, and plug-and-play framework offering reliable implementations of eight state-of-the-art intrinsic reward algorithms. Furthermore, we conduct an in-depth study that identifies critical implementation details and establishes well-justified standard practices in intrinsically-motivated RL. The source code for RLeXplore is available at https://github.com/RLE-Foundation/RLeXplore.

* 25 pages, 19 figures

Via

Access Paper or Ask Questions

Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

May 27, 2024

Adriana Hugessen, Roger Creus Castanyer, Faisal Mohamed, Glen Berseth

Figure 1 for Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

Figure 2 for Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

Figure 3 for Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

Figure 4 for Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

Abstract:Both entropy-minimizing and entropy-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method alone results in an agent that will consistently learn intelligent behavior across environments. In an effort to find a single entropy-based method that will encourage emergent behaviors in any environment, we propose an agent that can adapt its objective online, depending on the entropy conditions by framing the choice as a multi-armed bandit problem. We devise a novel intrinsic feedback signal for the bandit, which captures the agent's ability to control the entropy in its environment. We demonstrate that such agents can learn to control entropy and exhibit emergent behaviors in both high- and low-entropy regimes and can learn skillful behaviors in benchmark tasks. Videos of the trained agents and summarized findings can be found on our project page https://sites.google.com/view/surprise-adaptive-agents

* Published at the Reinforcement Learning Conference 2024

Via

Access Paper or Ask Questions

Improving Intrinsic Exploration by Creating Stationary Objectives

Nov 03, 2023

Roger Creus Castanyer, Joshua Romoff, Glen Berseth

Abstract:Exploration bonuses in reinforcement learning guide long-horizon exploration by defining custom intrinsic objectives. Count-based methods use the frequency of state visits to derive an exploration bonus. In this paper, we identify that any intrinsic reward function derived from count-based methods is non-stationary and hence induces a difficult objective to optimize for the agent. The key contribution of our work lies in transforming the original non-stationary rewards into stationary rewards through an augmented state representation. For this purpose, we introduce the Stationary Objectives For Exploration (SOFE) framework. SOFE requires identifying sufficient statistics for different exploration bonuses and finding an efficient encoding of these statistics to use as input to a deep network. SOFE is based on proposing state augmentations that expand the state space but hold the promise of simplifying the optimization of the agent's objective. Our experiments show that SOFE improves the agents' performance in challenging exploration problems, including sparse-reward tasks, pixel-based observations, 3D navigation, and procedurally generated environments.

Via

Access Paper or Ask Questions

Centralized control for multi-agent RL in a complex Real-Time-Strategy game

Apr 25, 2023

Roger Creus Castanyer

Figure 1 for Centralized control for multi-agent RL in a complex Real-Time-Strategy game

Figure 2 for Centralized control for multi-agent RL in a complex Real-Time-Strategy game

Figure 3 for Centralized control for multi-agent RL in a complex Real-Time-Strategy game

Figure 4 for Centralized control for multi-agent RL in a complex Real-Time-Strategy game

Abstract:Multi-agent Reinforcement learning (MARL) studies the behaviour of multiple learning agents that coexist in a shared environment. MARL is more challenging than single-agent RL because it involves more complex learning dynamics: the observations and rewards of each agent are functions of all other agents. In the context of MARL, Real-Time Strategy (RTS) games represent very challenging environments where multiple players interact simultaneously and control many units of different natures all at once. In fact, RTS games are so challenging for the current RL methods, that just being able to tackle them with RL is interesting. This project provides the end-to-end experience of applying RL in the Lux AI v2 Kaggle competition, where competitors design agents to control variable-sized fleets of units and tackle a multi-variable optimization, resource gathering, and allocation problem in a 1v1 scenario against other competitors. We use a centralized approach for training the RL agents, and report multiple design decisions along the process. We provide the source code of the project: https://github.com/roger-creus/centralized-control-lux.

Via

Access Paper or Ask Questions

Which Design Decisions in AI-enabled Mobile Applications Contribute to Greener AI?

Sep 28, 2021

Roger Creus Castanyer, Silverio Martínez-Fernández, Xavier Franch

Figure 1 for Which Design Decisions in AI-enabled Mobile Applications Contribute to Greener AI?

Abstract:Background: The construction, evolution and usage of complex artificial intelligence (AI) models demand expensive computational resources. While currently available high-performance computing environments support well this complexity, the deployment of AI models in mobile devices, which is an increasing trend, is challenging. Mobile applications consist of environments with low computational resources and hence imply limitations in the design decisions during the AI-enabled software engineering lifecycle that balance the trade-off between the accuracy and the complexity of the mobile applications. Objective: Our objective is to systematically assess the trade-off between accuracy and complexity when deploying complex AI models (e.g. neural networks) to mobile devices, which have an implicit resource limitation. We aim to cover (i) the impact of the design decisions on the achievement of high-accuracy and low resource-consumption implementations; and (ii) the validation of profiling tools for systematically promoting greener AI. Method: This confirmatory registered report consists of a plan to conduct an empirical study to quantify the implications of the design decisions on AI-enabled applications performance and to report experiences of the end-to-end AI-enabled software engineering lifecycle. Concretely, we will implement both image-based and language-based neural networks in mobile applications to solve multiple image classification and text classification problems on different benchmark datasets. Overall, we plan to model the accuracy and complexity of AI-enabled applications in operation with respect to their design decisions and will provide tools for allowing practitioners to gain consciousness of the quantitative relationship between the design decisions and the green characteristics of study.

* Accepted as registered report at ESEM 2021

Via

Access Paper or Ask Questions

Integration of Convolutional Neural Networks in Mobile Applications

Mar 11, 2021

Roger Creus Castanyer, Silverio Martínez-Fernández, Xavier Franch

Figure 1 for Integration of Convolutional Neural Networks in Mobile Applications

Figure 2 for Integration of Convolutional Neural Networks in Mobile Applications

Figure 3 for Integration of Convolutional Neural Networks in Mobile Applications

Figure 4 for Integration of Convolutional Neural Networks in Mobile Applications

Abstract:When building Deep Learning (DL) models, data scientists and software engineers manage the trade-off between their accuracy, or any other suitable success criteria, and their complexity. In an environment with high computational power, a common practice is making the models go deeper by designing more sophisticated architectures. However, in the context of mobile devices, which possess less computational power, keeping complexity under control is a must. In this paper, we study the performance of a system that integrates a DL model as a trade-off between the accuracy and the complexity. At the same time, we relate the complexity to the efficiency of the system. With this, we present a practical study that aims to explore the challenges met when optimizing the performance of DL models becomes a requirement. Concretely, we aim to identify: (i) the most concerning challenges when deploying DL-based software in mobile applications; and (ii) the path for optimizing the performance trade-off. We obtain results that verify many of the identified challenges in the related work such as the availability of frameworks and the software-data dependency. We provide a documentation of our experience when facing the identified challenges together with the discussion of possible solutions to them. Additionally, we implement a solution to the sustainability of the DL models when deployed in order to reduce the severity of other identified challenges. Moreover, we relate the performance trade-off to a new defined challenge featuring the impact of the complexity in the obtained accuracy. Finally, we discuss and motivate future work that aims to provide solutions to the more open challenges found.

* Pre-print. Accepted and to be published in WAIN@ICSE 2021

Via

Access Paper or Ask Questions