Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shlomo Zilberstein

University of Massachuetts Amherst

Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Apr 28, 2025

Mouad Abrini, Omri Abend, Dina Acklin, Henny Admoni, Gregor Aichinger, Nitay Alon, Zahra Ashktorab, Ashish Atreja, Moises Auron, Alexander Aufreiter(+98 more)

Abstract:This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.

* workshop proceedings

Via

Access Paper or Ask Questions

Distributed Multi-Agent Coordination Using Multi-Modal Foundation Models

Jan 24, 2025

Saaduddin Mahmud, Dorian Benhamou Goldfajn, Shlomo Zilberstein

Abstract:Distributed Constraint Optimization Problems (DCOPs) offer a powerful framework for multi-agent coordination but often rely on labor-intensive, manual problem construction. To address this, we introduce VL-DCOPs, a framework that takes advantage of large multimodal foundation models (LFMs) to automatically generate constraints from both visual and linguistic instructions. We then introduce a spectrum of agent archetypes for solving VL-DCOPs: from a neuro-symbolic agent that delegates some of the algorithmic decisions to an LFM, to a fully neural agent that depends entirely on an LFM for coordination. We evaluate these agent archetypes using state-of-the-art LLMs (large language models) and VLMs (vision language models) on three novel VL-DCOP tasks and compare their respective advantages and drawbacks. Lastly, we discuss how this work extends to broader frontier challenges in the DCOP literature.

Via

Access Paper or Ask Questions

MAPLE: A Framework for Active Preference Learning Guided by Large Language Models

Dec 10, 2024

Saaduddin Mahmud, Mason Nakamura, Shlomo Zilberstein

Abstract:The advent of large language models (LLMs) has sparked significant interest in using natural language for preference learning. However, existing methods often suffer from high computational burdens, taxing human supervision, and lack of interpretability. To address these issues, we introduce MAPLE, a framework for large language model-guided Bayesian active preference learning. MAPLE leverages LLMs to model the distribution over preference functions, conditioning it on both natural language feedback and conventional preference learning feedback, such as pairwise trajectory rankings. MAPLE also employs active learning to systematically reduce uncertainty in this distribution and incorporates a language-conditioned active query selection mechanism to identify informative and easy-to-answer queries, thus reducing human burden. We evaluate MAPLE's sample efficiency and preference inference quality across two benchmarks, including a real-world vehicle route planning benchmark using OpenStreetMap data. Our results demonstrate that MAPLE accelerates the learning process and effectively improves humans' ability to answer queries.

Via

Access Paper or Ask Questions

RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

Jun 28, 2023

Abhinav Bhatia, Samer B. Nashed, Shlomo Zilberstein

Figure 1 for RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

Figure 2 for RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

Figure 3 for RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

Figure 4 for RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

Abstract:Meta reinforcement learning (meta-RL) methods such as RL$^2$ have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, these RL algorithms struggle with long-horizon tasks and out-of-distribution tasks since they rely on recurrent neural networks to process the sequence of experiences instead of summarizing them into general RL components such as value functions. Moreover, even transformers have a practical limit to the length of histories they can efficiently reason about before training and inference costs become prohibitive. In contrast, traditional RL algorithms are data-inefficient since they do not leverage domain knowledge, but they do converge to an optimal policy as more data becomes available. In this paper, we propose RL$^3$, a principled hybrid approach that combines traditional RL and meta-RL by incorporating task-specific action-values learned through traditional RL as an input to the meta-RL neural network. We show that RL$^3$ earns greater cumulative reward on long-horizon and out-of-distribution tasks compared to RL$^2$, while maintaining the efficiency of the latter in the short term. Experiments are conducted on both custom and benchmark discrete domains from the meta-RL literature that exhibit a range of short-term, long-term, and complex dependencies.

Via

Access Paper or Ask Questions

Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL

Jun 07, 2022

Abhinav Bhatia, Philip S. Thomas, Shlomo Zilberstein

Figure 1 for Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL

Figure 2 for Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL

Abstract:Model-based reinforcement learning promises to learn an optimal policy from fewer interactions with the environment compared to model-free reinforcement learning by learning an intermediate model of the environment in order to predict future interactions. When predicting a sequence of interactions, the rollout length, which limits the prediction horizon, is a critical hyperparameter as accuracy of the predictions diminishes in the regions that are further away from real experience. As a result, with a longer rollout length, an overall worse policy is learned in the long run. Thus, the hyperparameter provides a trade-off between quality and efficiency. In this work, we frame the problem of tuning the rollout length as a meta-level sequential decision-making problem that optimizes the final policy learned by model-based reinforcement learning given a fixed budget of environment interactions by adapting the hyperparameter dynamically based on feedback from the learning process, such as accuracy of the model and the remaining budget of interactions. We use model-free deep reinforcement learning to solve the meta-level decision problem and demonstrate that our approach outperforms common heuristic baselines on two well-known reinforcement learning environments.

Via

Access Paper or Ask Questions

Dense Crowd Flow-Informed Path Planning

Jun 01, 2022

Emily Pruc, Shlomo Zilberstein, Joydeep Biswas

Figure 1 for Dense Crowd Flow-Informed Path Planning

Figure 2 for Dense Crowd Flow-Informed Path Planning

Figure 3 for Dense Crowd Flow-Informed Path Planning

Figure 4 for Dense Crowd Flow-Informed Path Planning

Abstract:Both pedestrian and robot comfort are of the highest priority whenever a robot is placed in an environment containing human beings. In the case of pedestrian-unaware mobile robots this desire for safety leads to the freezing robot problem, where a robot confronted with a large dynamic group of obstacles (such as a crowd of pedestrians) would determine all forward navigation unsafe causing the robot to stop in place. In order to navigate in a socially compliant manner while avoiding the freezing robot problem we are interested in understanding the flow of pedestrians in crowded scenarios. By treating the pedestrians in the crowd as particles moved along by the crowd itself we can model the system as a time dependent flow field. From this flow field we can extract different flow segments that reflect the motion patterns emerging from the crowd. These motion patterns can then be accounted for during the control and navigation of a mobile robot allowing it to move safely within the flow of the crowd to reach a desired location within or beyond the flow. We combine flow-field extraction with a discrete heuristic search to create Flow-Informed path planning (FIPP). We provide empirical results showing that when compared against a trajectory-rollout local path planner, a robot using FIPP was able not only to reach its goal more quickly but also was shown to be more socially compliant than a robot using traditional techniques both in simulation and on real robots.

Via

Access Paper or Ask Questions

A Unifying Framework for Causal Explanation of Sequential Decision Making

May 30, 2022

Samer B. Nashed, Saaduddin Mahmud, Claudia V. Goldman, Shlomo Zilberstein

Figure 1 for A Unifying Framework for Causal Explanation of Sequential Decision Making

Figure 2 for A Unifying Framework for Causal Explanation of Sequential Decision Making

Figure 3 for A Unifying Framework for Causal Explanation of Sequential Decision Making

Figure 4 for A Unifying Framework for Causal Explanation of Sequential Decision Making

Abstract:We present a novel framework for causal explanations of stochastic, sequential decision-making systems. Building on the well-studied structural causal model paradigm for causal reasoning, we show how to identify semantically distinct types of explanations for agent actions using a single unified approach. We provide results on the generality of this framework, run time bounds, and offer several approximate techniques. Finally, we discuss several qualitative scenarios that illustrate the framework's flexibility and efficacy.

* 9 pages, 4 figures, conference

Via

Access Paper or Ask Questions

Competence-Aware Path Planning via Introspective Perception

Sep 28, 2021

Sadegh Rabiee, Connor Basich, Kyle Hollins Wray, Shlomo Zilberstein, Joydeep Biswas

Figure 1 for Competence-Aware Path Planning via Introspective Perception

Figure 2 for Competence-Aware Path Planning via Introspective Perception

Figure 3 for Competence-Aware Path Planning via Introspective Perception

Figure 4 for Competence-Aware Path Planning via Introspective Perception

Abstract:Robots deployed in the real world over extended periods of time need to reason about unexpected failures, learn to predict them, and to proactively take actions to avoid future failures. Existing approaches for competence-aware planning are either model-based, requiring explicit enumeration of known failure modes, or purely statistical, using state- and location-specific failure statistics to infer competence. We instead propose a structured model-free approach to competence-aware planning by reasoning about plan execution failures due to errors in perception, without requiring a-priori enumeration of failure modes or requiring location-specific failure statistics. We introduce competence-aware path planning via introspective perception (CPIP), a Bayesian framework to iteratively learn and exploit task-level competence in novel deployment environments. CPIP factorizes the competence-aware planning problem into two components. First, perception errors are learned in a model-free and location-agnostic setting via introspective perception prior to deployment in novel environments. Second, during actual deployments, the prediction of task-level failures is learned in a context-aware setting. Experiments in a simulation show that the proposed CPIP approach outperforms the frequentist baseline in multiple mobile robot tasks, and is further validated via real robot experiments in an environment with perceptually challenging obstacles and terrain.

* 8 pages, 8 figures

Via

Access Paper or Ask Questions

Agent-aware State Estimation in Autonomous Vehicles

Aug 01, 2021

Shane Parr, Ishan Khatri, Justin Svegliato, Shlomo Zilberstein

Figure 1 for Agent-aware State Estimation in Autonomous Vehicles

Figure 2 for Agent-aware State Estimation in Autonomous Vehicles

Figure 3 for Agent-aware State Estimation in Autonomous Vehicles

Figure 4 for Agent-aware State Estimation in Autonomous Vehicles

Abstract:Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation -- a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation -- a tractable class of agent-aware state estimation -- and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios.

* To appear in IROS 2021

Via

Access Paper or Ask Questions

Mitigating Negative Side Effects via Environment Shaping

Feb 13, 2021

Sandhya Saisubramanian, Shlomo Zilberstein

Figure 1 for Mitigating Negative Side Effects via Environment Shaping

Figure 2 for Mitigating Negative Side Effects via Environment Shaping

Figure 3 for Mitigating Negative Side Effects via Environment Shaping

Figure 4 for Mitigating Negative Side Effects via Environment Shaping

Abstract:Agents operating in unstructured environments often produce negative side effects (NSE), which are difficult to identify at design time. While the agent can learn to mitigate the side effects from human feedback, such feedback is often expensive and the rate of learning is sensitive to the agent's state representation. We examine how humans can assist an agent, beyond providing feedback, and exploit their broader scope of knowledge to mitigate the impacts of NSE. We formulate this problem as a human-agent team with decoupled objectives. The agent optimizes its assigned task, during which its actions may produce NSE. The human shapes the environment through minor reconfiguration actions so as to mitigate the impacts of the agent's side effects, without affecting the agent's ability to complete its assigned task. We present an algorithm to solve this problem and analyze its theoretical properties. Through experiments with human subjects, we assess the willingness of users to perform minor environment modifications to mitigate the impacts of NSE. Empirical evaluation of our approach shows that the proposed framework can successfully mitigate NSE, without affecting the agent's ability to complete its assigned task.

* 9 pages

Via

Access Paper or Ask Questions