Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rodrigo Toro Icarte

Data Distributional Properties As Inductive Bias for Systematic Generalization

Feb 27, 2025

Felipe del Río, Alain Raymond-Sáez, Daniel Florea, Rodrigo Toro Icarte, Julio Hurtado, Cristián Buc Calderón, Álvaro Soto

Figure 1 for Data Distributional Properties As Inductive Bias for Systematic Generalization

Figure 2 for Data Distributional Properties As Inductive Bias for Systematic Generalization

Figure 3 for Data Distributional Properties As Inductive Bias for Systematic Generalization

Figure 4 for Data Distributional Properties As Inductive Bias for Systematic Generalization

Abstract:Deep neural networks (DNNs) struggle at systematic generalization (SG). Several studies have evaluated the possibility to promote SG through the proposal of novel architectures, loss functions or training methodologies. Few studies, however, have focused on the role of training data properties in promoting SG. In this work, we investigate the impact of certain data distributional properties, as inductive biases for the SG ability of a multi-modal language model. To this end, we study three different properties. First, data diversity, instantiated as an increase in the possible values a latent property in the training distribution may take. Second, burstiness, where we probabilistically restrict the number of possible values of latent factors on particular inputs during training. Third, latent intervention, where a particular latent factor is altered randomly during training. We find that all three factors significantly enhance SG, with diversity contributing an 89\% absolute increase in accuracy in the most affected property. Through a series of experiments, we test various hypotheses to understand why these properties promote SG. Finally, we find that Normalized Mutual Information (NMI) between latent attributes in the training distribution is strongly predictive of out-of-distribution generalization. We find that a mechanism by which lower NMI induces SG is in the geometry of representations. In particular, we find that NMI induces more parallelism in neural representations (i.e., input features coded in parallel neural vectors) of the model, a property related to the capacity of reasoning by analogy.

Via

Access Paper or Ask Questions

Being Considerate as a Pathway Towards Pluralistic Alignment for Agentic AI

Nov 15, 2024

Parand A. Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. McIlraith

Abstract:Pluralistic alignment is concerned with ensuring that an AI system's objectives and behaviors are in harmony with the diversity of human values and perspectives. In this paper we study the notion of pluralistic alignment in the context of agentic AI, and in particular in the context of an agent that is trying to learn a policy in a manner that is mindful of the values and perspective of others in the environment. To this end, we show how being considerate of the future wellbeing and agency of other (human) agents can promote a form of pluralistic alignment.

* Pluralistic Alignment Workshop at NeurIPS 2024

Via

Access Paper or Ask Questions

Reward Machines for Deep RL in Noisy and Uncertain Environments

May 31, 2024

Andrew C. Li, Zizhao Chen, Toryn Q. Klassen, Pashootan Vaezipoor, Rodrigo Toro Icarte, Sheila A. McIlraith

Abstract:Reward Machines provide an automata-inspired structure for specifying instructions, safety constraints, and other temporally extended reward-worthy behaviour. By exposing complex reward function structure, they enable counterfactual learning updates that have resulted in impressive sample efficiency gains. While Reward Machines have been employed in both tabular and deep RL settings, they have typically relied on a ground-truth interpretation of the domain-specific vocabulary that form the building blocks of the reward function. Such ground-truth interpretations can be elusive in many real-world settings, due in part to partial observability or noisy sensing. In this paper, we explore the use of Reward Machines for Deep RL in noisy and uncertain environments. We characterize this problem as a POMDP and propose a suite of RL algorithms that leverage task structure under uncertain interpretation of domain-specific vocabulary. Theoretical analysis exposes pitfalls in naive approaches to this problem, while experimental results show that our algorithms successfully leverage task structure to improve performance under noisy interpretations of the vocabulary. Our results provide a general framework for exploiting Reward Machines in partially observable environments.

Via

Access Paper or Ask Questions

Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior

Jan 08, 2023

Phillip J. K. Christoffersen, Andrew C. Li, Rodrigo Toro Icarte, Sheila A. McIlraith

Figure 1 for Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior

Figure 2 for Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior

Abstract:Many real-world reinforcement learning (RL) problems necessitate learning complex, temporally extended behavior that may only receive reward signal when the behavior is completed. If the reward-worthy behavior is known, it can be specified in terms of a non-Markovian reward function - a function that depends on aspects of the state-action history, rather than just the current state and action. Such reward functions yield sparse rewards, necessitating an inordinate number of experiences to find a policy that captures the reward-worthy pattern of behavior. Recent work has leveraged Knowledge Representation (KR) to provide a symbolic abstraction of aspects of the state that summarize reward-relevant properties of the state-action history and support learning a Markovian decomposition of the problem in terms of an automaton over the KR. Providing such a decomposition has been shown to vastly improve learning rates, especially when coupled with algorithms that exploit automaton structure. Nevertheless, such techniques rely on a priori knowledge of the KR. In this work, we explore how to automatically discover useful state abstractions that support learning automata over the state-action history. The result is an end-to-end algorithm that can learn optimal policies with significantly fewer environment samples than state-of-the-art RL on simple non-Markovian domains.

* 7 pages, 2 figures, presented at KR2ML workshop at NeurIPS 2020

Via

Access Paper or Ask Questions

Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

Nov 23, 2022

Andrew C. Li, Zizhao Chen, Pashootan Vaezipoor, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. McIlraith

Figure 1 for Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

Figure 2 for Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

Figure 3 for Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

Figure 4 for Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

Abstract:Natural and formal languages provide an effective mechanism for humans to specify instructions and reward functions. We investigate how to generate policies via RL when reward functions are specified in a symbolic language captured by Reward Machines, an increasingly popular automaton-inspired structure. We are interested in the case where the mapping of environment state to a symbolic (here, Reward Machine) vocabulary -- commonly known as the labelling function -- is uncertain from the perspective of the agent. We formulate the problem of policy learning in Reward Machines with noisy symbolic abstractions as a special class of POMDP optimization problem, and investigate several methods to address the problem, building on existing and new techniques, the latter focused on predicting Reward Machine state, rather than on grounding of individual symbols. We analyze these methods and evaluate them experimentally under varying degrees of uncertainty in the correct interpretation of the symbolic vocabulary. We verify the strength of our approach and the limitation of existing methods via an empirical investigation on both illustrative, toy domains and partially observable, deep RL domains.

* NeurIPS Deep Reinforcement Learning Workshop 2022

Via

Access Paper or Ask Questions

Challenges to Solving Combinatorially Hard Long-Horizon Deep RL Tasks

Jun 03, 2022

Andrew C. Li, Pashootan Vaezipoor, Rodrigo Toro Icarte, Sheila A. McIlraith

Figure 1 for Challenges to Solving Combinatorially Hard Long-Horizon Deep RL Tasks

Figure 2 for Challenges to Solving Combinatorially Hard Long-Horizon Deep RL Tasks

Figure 3 for Challenges to Solving Combinatorially Hard Long-Horizon Deep RL Tasks

Figure 4 for Challenges to Solving Combinatorially Hard Long-Horizon Deep RL Tasks

Abstract:Deep reinforcement learning has shown promise in discrete domains requiring complex reasoning, including games such as Chess, Go, and Hanabi. However, this type of reasoning is less often observed in long-horizon, continuous domains with high-dimensional observations, where instead RL research has predominantly focused on problems with simple high-level structure (e.g. opening a drawer or moving a robot as fast as possible). Inspired by combinatorially hard optimization problems, we propose a set of robotics tasks which admit many distinct solutions at the high-level, but require reasoning about states and rewards thousands of steps into the future for the best performance. Critically, while RL has traditionally suffered on complex, long-horizon tasks due to sparse rewards, our tasks are carefully designed to be solvable without specialized exploration. Nevertheless, our investigation finds that standard RL methods often neglect long-term effects due to discounting, while general-purpose hierarchical RL approaches struggle unless additional abstract domain knowledge can be exploited.

Via

Access Paper or Ask Questions

Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

Dec 17, 2021

Rodrigo Toro Icarte, Ethan Waldie, Toryn Q. Klassen, Richard Valenzano, Margarita P. Castro, Sheila A. McIlraith

Figure 1 for Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

Figure 2 for Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

Figure 3 for Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

Figure 4 for Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

Abstract:Reinforcement learning (RL) is a central problem in artificial intelligence. This problem consists of defining artificial agents that can learn optimal behaviour by interacting with an environment -- where the optimal behaviour is defined with respect to a reward signal that the agent seeks to maximize. Reward machines (RMs) provide a structured, automata-based representation of a reward function that enables an RL agent to decompose an RL problem into structured subproblems that can be efficiently learned via off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential.

Via

Access Paper or Ask Questions

AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning

Jun 06, 2021

Maayan Shvo, Zhiming Hu, Rodrigo Toro Icarte, Iqbal Mohomed, Allan Jepson, Sheila A. McIlraith

Figure 1 for AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning

Figure 2 for AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning

Figure 3 for AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning

Figure 4 for AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning

Abstract:Human beings, even small children, quickly become adept at figuring out how to use applications on their mobile devices. Learning to use a new app is often achieved via trial-and-error, accelerated by transfer of knowledge from past experiences with like apps. The prospect of building a smarter smartphone - one that can learn how to achieve tasks using mobile apps - is tantalizing. In this paper we explore the use of Reinforcement Learning (RL) with the goal of advancing this aspiration. We introduce an RL-based framework for learning to accomplish tasks in mobile apps. RL agents are provided with states derived from the underlying representation of on-screen elements, and rewards that are based on progress made in the task. Agents can interact with screen elements by tapping or typing. Our experimental results, over a number of mobile apps, show that RL agents can learn to accomplish multi-step tasks, as well as achieve modest generalization across different apps. More generally, we develop a platform which addresses several engineering challenges to enable an effective RL training environment. Our AppBuddy platform is compatible with OpenAI Gym and includes a suite of mobile apps and benchmark tasks that supports a diversity of RL research in the mobile app setting.

Via

Access Paper or Ask Questions

Be Considerate: Objectives, Side Effects, and Deciding How to Act

Jun 04, 2021

Parand Alizadeh Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. McIlraith

Figure 1 for Be Considerate: Objectives, Side Effects, and Deciding How to Act

Figure 2 for Be Considerate: Objectives, Side Effects, and Deciding How to Act

Figure 3 for Be Considerate: Objectives, Side Effects, and Deciding How to Act

Figure 4 for Be Considerate: Objectives, Side Effects, and Deciding How to Act

Abstract:Recent work in AI safety has highlighted that in sequential decision making, objectives are often underspecified or incomplete. This gives discretion to the acting agent to realize the stated objective in ways that may result in undesirable outcomes. We contend that to learn to act safely, a reinforcement learning (RL) agent should include contemplation of the impact of its actions on the wellbeing and agency of others in the environment, including other acting agents and reactive processes. We endow RL agents with the ability to contemplate such impact by augmenting their reward based on expectation of future return by others in the environment, providing different criteria for characterizing impact. We further endow these agents with the ability to differentially factor this impact into their decision making, manifesting behavior that ranges from self-centred to self-less, as demonstrated by experiments in gridworld environments.

Via

Access Paper or Ask Questions

LTL2Action: Generalizing LTL Instructions for Multi-Task RL

Feb 25, 2021

Pashootan Vaezipoor, Andrew Li, Rodrigo Toro Icarte, Sheila McIlraith

Figure 1 for LTL2Action: Generalizing LTL Instructions for Multi-Task RL

Figure 2 for LTL2Action: Generalizing LTL Instructions for Multi-Task RL

Figure 3 for LTL2Action: Generalizing LTL Instructions for Multi-Task RL

Figure 4 for LTL2Action: Generalizing LTL Instructions for Multi-Task RL

Abstract:We address the problem of teaching a deep reinforcement learning (RL) agent to follow instructions in multi-task environments. The combinatorial task sets we target consist of up to $~10^{39}$ unique tasks. We employ a well-known formal language -- linear temporal logic (LTL) -- to specify instructions, using a domain-specific vocabulary. We propose a novel approach to learning that exploits the compositional syntax and the semantics of LTL, enabling our RL agent to learn task-conditioned policies that generalize to new instructions, not observed during training. The expressive power of LTL supports the specification of a diversity of complex temporally extended behaviours that include conditionals and alternative realizations. To reduce the overhead of learning LTL semantics, we introduce an environment-agnostic LTL pretraining scheme which improves sample-efficiency in downstream environments. Experiments on discrete and continuous domains demonstrate the strength of our approach in learning to solve (unseen) tasks, given LTL instructions.

Via

Access Paper or Ask Questions