Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Somjit Nath

Behaviour Discovery and Attribution for Explainable Reinforcement Learning

Mar 19, 2025

Rishav Rishav, Somjit Nath, Vincent Michalski, Samira Ebrahimi Kahou

Abstract:Explaining the decisions made by reinforcement learning (RL) agents is critical for building trust and ensuring reliability in real-world applications. Traditional approaches to explainability often rely on saliency analysis, which can be limited in providing actionable insights. Recently, there has been growing interest in attributing RL decisions to specific trajectories within a dataset. However, these methods often generalize explanations to long trajectories, potentially involving multiple distinct behaviors. Often, providing multiple more fine grained explanations would improve clarity. In this work, we propose a framework for behavior discovery and action attribution to behaviors in offline RL trajectories. Our method identifies meaningful behavioral segments, enabling more precise and granular explanations associated with high level agent behaviors. This approach is adaptable across diverse environments with minimal modifications, offering a scalable and versatile solution for behavior discovery and attribution for explainable RL.

Via

Access Paper or Ask Questions

Unsupervised Event Outlier Detection in Continuous Time

Nov 25, 2024

Somjit Nath, Yik Chau Lui, Siqi Liu

Figure 1 for Unsupervised Event Outlier Detection in Continuous Time

Figure 2 for Unsupervised Event Outlier Detection in Continuous Time

Figure 3 for Unsupervised Event Outlier Detection in Continuous Time

Figure 4 for Unsupervised Event Outlier Detection in Continuous Time

Abstract:Event sequence data record the occurrences of events in continuous time. Event sequence forecasting based on temporal point processes (TPPs) has been extensively studied, but outlier or anomaly detection, especially without any supervision from humans, is still underexplored. In this work, we develop, to the best our knowledge, the first unsupervised outlier detection approach to detecting abnormal events. Our novel unsupervised outlier detection framework is based on ideas from generative adversarial networks (GANs) and reinforcement learning (RL). We train a 'generator' that corrects outliers in the data with a 'discriminator' that learns to discriminate the corrected data from the real data, which may contain outliers. A key insight is that if the generator made a mistake in the correction, it would generate anomalies that are different from the anomalies in the real data, so it serves as data augmentation for the discriminator learning. Different from typical GAN-based outlier detection approaches, our method employs the generator to detect outliers in an online manner. The experimental results show that our method can detect event outliers more accurately than the state-of-the-art approaches.

Via

Access Paper or Ask Questions

Spectral Temporal Contrastive Learning

Dec 07, 2023

Sacha Morin, Somjit Nath, Samira Ebrahimi Kahou, Guy Wolf

Figure 1 for Spectral Temporal Contrastive Learning

Figure 2 for Spectral Temporal Contrastive Learning

Figure 3 for Spectral Temporal Contrastive Learning

Abstract:Learning useful data representations without requiring labels is a cornerstone of modern deep learning. Self-supervised learning methods, particularly contrastive learning (CL), have proven successful by leveraging data augmentations to define positive pairs. This success has prompted a number of theoretical studies to better understand CL and investigate theoretical bounds for downstream linear probing tasks. This work is concerned with the temporal contrastive learning (TCL) setting where the sequential structure of the data is used instead to define positive pairs, which is more commonly used in RL and robotics contexts. In this paper, we adapt recent work on Spectral CL to formulate Spectral Temporal Contrastive Learning (STCL). We discuss a population loss based on a state graph derived from a time-homogeneous reversible Markov chain with uniform stationary distribution. The STCL loss enables to connect the linear probing performance to the spectral properties of the graph, and can be estimated by considering previously observed data sequences as an ensemble of MCMC chains.

* Accepted to Self-Supervised Learning - Theory and Practice, NeurIPS Workshop, 2023

Via

Access Paper or Ask Questions

Discovering Object-Centric Generalized Value Functions From Pixels

Apr 27, 2023

Somjit Nath, Gopeshh Raaj Subbaraj, Khimya Khetarpal, Samira Ebrahimi Kahou

Figure 1 for Discovering Object-Centric Generalized Value Functions From Pixels

Figure 2 for Discovering Object-Centric Generalized Value Functions From Pixels

Figure 3 for Discovering Object-Centric Generalized Value Functions From Pixels

Figure 4 for Discovering Object-Centric Generalized Value Functions From Pixels

Abstract:Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using hand-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent "question" functions and leveraging the subsequent learned general value functions for control. We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.

* Accepted at ICML 2023

Via

Access Paper or Ask Questions

Locally Constrained Representations in Reinforcement Learning

Sep 20, 2022

Somjit Nath, Samira Ebrahimi Kahou

Figure 1 for Locally Constrained Representations in Reinforcement Learning

Figure 2 for Locally Constrained Representations in Reinforcement Learning

Figure 3 for Locally Constrained Representations in Reinforcement Learning

Figure 4 for Locally Constrained Representations in Reinforcement Learning

Abstract:The success of Reinforcement Learning (RL) heavily relies on the ability to learn robust representations from the observations of the environment. In most cases, the representations learned purely by the reinforcement learning loss can differ vastly across states depending on how the value functions change. However, the representations learned need not be very specific to the task at hand. Relying only on the RL objective may yield representations that vary greatly across successive time steps. In addition, since the RL loss has a changing target, the representations learned would depend on how good the current values/policies are. Thus, disentangling the representations from the main task would allow them to focus more on capturing transition dynamics which can improve generalization. To this end, we propose locally constrained representations, where an auxiliary loss forces the state representations to be predictable by the representations of the neighbouring states. This encourages the representations to be driven not only by the value/policy learning but also self-supervised learning, which constrains the representations from changing too rapidly. We evaluate the proposed method on several known benchmarks and observe strong performance. Especially in continuous control tasks, our experiments show a significant advantage over a strong baseline.

Via

Access Paper or Ask Questions

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Aug 22, 2022

Shivakanth Sujit, Somjit Nath, Pedro H. M. Braga, Samira Ebrahimi Kahou

Figure 1 for Prioritizing Samples in Reinforcement Learning with Reducible Loss

Figure 2 for Prioritizing Samples in Reinforcement Learning with Reducible Loss

Figure 3 for Prioritizing Samples in Reinforcement Learning with Reducible Loss

Figure 4 for Prioritizing Samples in Reinforcement Learning with Reducible Loss

Abstract:Most reinforcement learning algorithms take advantage of an experience replay buffer to repeatedly train on samples the agent has observed in the past. This prevents catastrophic forgetting, however simply assigning equal importance to each of the samples is a naive strategy. In this paper, we propose a method to prioritize samples based on how much we can learn from a sample. We define the learn-ability of a sample as the steady decrease of the training loss associated with this sample over time. We develop an algorithm to prioritize samples with high learn-ability, while assigning lower priority to those that are hard-to-learn, typically caused by noise or stochasticity. We empirically show that our method is more robust than random sampling and also better than just prioritizing with respect to the training loss, i.e. the temporal difference loss, which is used in vanilla prioritized experience replay.

Via

Access Paper or Ask Questions

A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

Mar 09, 2022

Hardik Meisheri, Somjit Nath, Mayank Baranwal, Harshad Khadilkar

Figure 1 for A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

Figure 2 for A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

Figure 3 for A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

Figure 4 for A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

Abstract:Most existing literature on supply chain and inventory management consider stochastic demand processes with zero or constant lead times. While it is true that in certain niche scenarios, uncertainty in lead times can be ignored, most real-world scenarios exhibit stochasticity in lead times. These random fluctuations can be caused due to uncertainty in arrival of raw materials at the manufacturer's end, delay in transportation, an unforeseen surge in demands, and switching to a different vendor, to name a few. Stochasticity in lead times is known to severely degrade the performance in an inventory management system, and it is only fair to abridge this gap in supply chain system through a principled approach. Motivated by the recently introduced delay-resolved deep Q-learning (DRDQN) algorithm, this paper develops a reinforcement learning based paradigm for handling uncertainty in lead times (\emph{action delay}). Through empirical evaluations, it is further shown that the inventory management with uncertain lead times is not only equivalent to that of delay in information sharing across multiple echelons (\emph{observation delay}), a model trained to handle one kind of delay is capable to handle delays of another kind without requiring to be retrained. Finally, we apply the delay-resolved framework to scenarios comprising of multiple products subjected to stochasticity in lead times, and elucidate how the delay-resolved framework negates the effect of any delay to achieve near-optimal performance.

Via

Access Paper or Ask Questions

Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Mar 02, 2022

Somjit Nath, Omkar Shelke, Durgesh Kalwar, Hardik Meisheri, Harshad Khadilkar

Figure 1 for Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Figure 2 for Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Figure 3 for Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Figure 4 for Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Abstract:Exploration versus exploitation dilemma is a significant problem in reinforcement learning (RL), particularly in complex environments with large state space and sparse rewards. When optimizing for a particular goal, running simple smaller tasks can often be a good way to learn additional information about the environment. Exploration methods have been used to sample better trajectories from the environment for improved performance while auxiliary tasks have been incorporated generally where the reward is sparse. If there is little reward signal available, the agent requires clever exploration strategies to reach parts of the state space that contain relevant sub-goals. However, that exploration needs to be balanced with the need for exploiting the learned policy. This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy. We provide a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrate the performance advantage in three navigation tasks.

Via

Access Paper or Ask Questions

Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

Aug 17, 2021

Somjit Nath, Mayank Baranwal, Harshad Khadilkar

Figure 1 for Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

Figure 2 for Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

Figure 3 for Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

Figure 4 for Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

Abstract:Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays. The presence of delays degrades the performance of reinforcement learning (RL) algorithms, often to such an extent that algorithms fail to learn anything substantial. This paper formally describes the notion of Markov Decision Processes (MDPs) with stochastic delays and shows that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure. We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with stochastic delays in actions and observations. The delay-resolved deep Q-network (DRDQN) algorithm is bench-marked on a variety of environments comprising of multi-step and stochastic delays and results in better performance, both in terms of achieving near-optimal rewards and minimizing the computational overhead thereof, with respect to the currently established algorithms.

* Accepted at CIKM'21

Via

Access Paper or Ask Questions

Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

Jun 07, 2020

Nazneen N Sultana, Hardik Meisheri, Vinita Baniwal, Somjit Nath, Balaraman Ravindran, Harshad Khadilkar

Figure 1 for Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

Figure 2 for Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

Figure 3 for Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

Figure 4 for Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

Abstract:This paper describes the application of reinforcement learning (RL) to multi-product inventory management in supply chains. The problem description and solution are both adapted from a real-world business solution. The novelty of this problem with respect to supply chain literature is (i) we consider concurrent inventory management of a large number (50 to 1000) of products with shared capacity, (ii) we consider a multi-node supply chain consisting of a warehouse which supplies three stores, (iii) the warehouse, stores, and transportation from warehouse to stores have finite capacities, (iv) warehouse and store replenishment happen at different time scales and with realistic time lags, and (v) demand for products at the stores is stochastic. We describe a novel formulation in a multi-agent (hierarchical) reinforcement learning framework that can be used for parallelised decision-making, and use the advantage actor critic (A2C) algorithm with quantised action spaces to solve the problem. Experiments show that the proposed approach is able to handle a multi-objective reward comprised of maximising product sales and minimising wastage of perishable products.

Via

Access Paper or Ask Questions