Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sriram Ganapathi Subramanian

Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling

May 29, 2025

Gustavo Sutter Pessurno de Carvalho, Mohammed Abdulrahman, Hao Wang, Sriram Ganapathi Subramanian, Marc St-Aubin, Sharon O'Sullivan, Lawrence Wan, Luis Ricardez-Sandoval, Pascal Poupart, Agustinus Kristiadi

Abstract:The optimization of expensive black-box functions is ubiquitous in science and engineering. A common solution to this problem is Bayesian optimization (BO), which is generally comprised of two components: (i) a surrogate model and (ii) an acquisition function, which generally require expensive re-training and optimization steps at each iteration, respectively. Although recent work enabled in-context surrogate models that do not require re-training, virtually all existing BO methods still require acquisition function maximization to select the next observation, which introduces many knobs to tune, such as Monte Carlo samplers and multi-start optimizers. In this work, we propose a completely in-context, zero-shot solution for BO that does not require surrogate fitting or acquisition function optimization. This is done by using a pre-trained deep generative model to directly sample from the posterior over the optimum point. We show that this process is equivalent to Thompson sampling and demonstrate the capabilities and cost-effectiveness of our foundation model on a suite of real-world benchmarks. We achieve an efficiency gain of more than 35x in terms of wall-clock time when compared with Gaussian process-based BO, enabling efficient parallel and distributed BO, e.g., for high-throughput optimization.

Via

Access Paper or Ask Questions

Learning to Negotiate via Voluntary Commitment

Mar 05, 2025

Shuhui Zhu, Baoxiang Wang, Sriram Ganapathi Subramanian, Pascal Poupart

Abstract:The partial alignment and conflict of autonomous agents lead to mixed-motive scenarios in many real-world applications. However, agents may fail to cooperate in practice even when cooperation yields a better outcome. One well known reason for this failure comes from non-credible commitments. To facilitate commitments among agents for better cooperation, we define Markov Commitment Games (MCGs), a variant of commitment games, where agents can voluntarily commit to their proposed future plans. Based on MCGs, we propose a learnable commitment protocol via policy gradients. We further propose incentive-compatible learning to accelerate convergence to equilibria with better social welfare. Experimental results in challenging mixed-motive tasks demonstrate faster empirical convergence and higher returns for our method compared with its counterparts. Our code is available at https://github.com/shuhui-zhu/DCL.

* AISTATS 2025

Via

Access Paper or Ask Questions

A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Sep 11, 2024

Guiliang Liu, Sheng Xu, Shicheng Liu, Ashish Gaurav, Sriram Ganapathi Subramanian, Pascal Poupart

Figure 1 for A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Figure 2 for A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Figure 3 for A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Figure 4 for A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Abstract:Inverse Constrained Reinforcement Learning (ICRL) is the task of inferring the implicit constraints followed by expert agents from their demonstration data. As an emerging research topic, ICRL has received considerable attention in recent years. This article presents a categorical survey of the latest advances in ICRL. It serves as a comprehensive reference for machine learning researchers and practitioners, as well as starters seeking to comprehend the definitions, advancements, and important challenges in ICRL. We begin by formally defining the problem and outlining the algorithmic framework that facilitates constraint inference across various scenarios. These include deterministic or stochastic environments, environments with limited demonstrations, and multiple agents. For each context, we illustrate the critical challenges and introduce a series of fundamental methods to tackle these issues. This survey encompasses discrete, virtual, and realistic environments for evaluating ICRL agents. We also delve into the most pertinent applications of ICRL, such as autonomous driving, robot control, and sports analytics. To stimulate continuing research, we conclude the survey with a discussion of key unresolved questions in ICRL that can effectively foster a bridge between theoretical understanding and practical industrial applications.

* 28 pages

Via

Access Paper or Ask Questions

Confidence Aware Inverse Constrained Reinforcement Learning

Jun 24, 2024

Sriram Ganapathi Subramanian, Guiliang Liu, Mohammed Elmahgiubi, Kasra Rezaee, Pascal Poupart

Abstract:In coming up with solutions to real-world problems, humans implicitly adhere to constraints that are too numerous and complex to be specified completely. However, reinforcement learning (RL) agents need these constraints to learn the correct optimal policy in these settings. The field of Inverse Constraint Reinforcement Learning (ICRL) deals with this problem and provides algorithms that aim to estimate the constraints from expert demonstrations collected offline. Practitioners prefer to know a measure of confidence in the estimated constraints, before deciding to use these constraints, which allows them to only use the constraints that satisfy a desired level of confidence. However, prior works do not allow users to provide the desired level of confidence for the inferred constraints. This work provides a principled ICRL method that can take a confidence level with a set of expert demonstrations and outputs a constraint that is at least as constraining as the true underlying constraint with the desired level of confidence. Further, unlike previous methods, this method allows a user to know if the number of expert trajectories is insufficient to learn a constraint with a desired level of confidence, and therefore collect more expert trajectories as required to simultaneously learn constraints with the desired level of confidence and a policy that achieves the desired level of performance.

* Paper to appear in ICML 2024

Via

Access Paper or Ask Questions

How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

Jun 10, 2024

Agustinus Kristiadi, Felix Strieth-Kalthoff, Sriram Ganapathi Subramanian, Vincent Fortuin, Pascal Poupart, Geoff Pleiss

Figure 1 for How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

Figure 2 for How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

Figure 3 for How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

Figure 4 for How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

Abstract:Bayesian optimization (BO) is an integral part of automated scientific discovery -- the so-called self-driving lab -- where human inputs are ideally minimal or at least non-blocking. However, scientists often have strong intuition, and thus human feedback is still useful. Nevertheless, prior works in enhancing BO with expert feedback, such as by incorporating it in an offline or online but blocking (arrives at each BO iteration) manner, are incompatible with the spirit of self-driving labs. In this work, we study whether a small amount of randomly arriving expert feedback that is being incorporated in a non-blocking manner can improve a BO campaign. To this end, we run an additional, independent computing thread on top of the BO loop to handle the feedback-gathering process. The gathered feedback is used to learn a Bayesian preference model that can readily be incorporated into the BO thread, to steer its exploration-exploitation process. Experiments on toy and chemistry datasets suggest that even just a few intermittent, asynchronous expert feedback can be useful for improving or constraining BO. This can especially be useful for its implication in improving self-driving labs, e.g. making them more data-efficient and less costly.

* AABI 2024. Code: https://github.com/wiseodd/bo-async-feedback

Via

Access Paper or Ask Questions

ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry

May 23, 2023

Chris Beeler, Sriram Ganapathi Subramanian, Kyle Sprague, Nouha Chatti, Colin Bellinger, Mitchell Shahen, Nicholas Paquin, Mark Baula, Amanuel Dawit, Zihan Yang(+3 more)

Figure 1 for ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry

Figure 2 for ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry

Figure 3 for ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry

Figure 4 for ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry

Abstract:This paper provides a simulated laboratory for making use of Reinforcement Learning (RL) for chemical discovery. Since RL is fairly data intensive, training agents `on-the-fly' by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are not commonly found in RL benchmarks and therefore offer a rich space to work in. We introduce a set of highly customizable and open-source RL environments, ChemGymRL, based on the standard Open AI Gym template. ChemGymRL supports a series of interconnected virtual chemical benches where RL agents can operate and train. The paper introduces and details each of these benches using well-known chemical reactions as illustrative examples, and trains a set of standard RL algorithms in each of these benches. Finally, discussion and comparison of the performances of several standard RL methods are provided in addition to a list of directions for future work as a vision for the further development and usage of ChemGymRL.

* 19 pages, 13 figures, 2 tables

Via

Access Paper or Ask Questions

Learning from Multiple Independent Advisors in Multi-agent Reinforcement Learning

Jan 26, 2023

Sriram Ganapathi Subramanian, Matthew E. Taylor, Kate Larson, Mark Crowley

Abstract:Multi-agent reinforcement learning typically suffers from the problem of sample inefficiency, where learning suitable policies involves the use of many data samples. Learning from external demonstrators is a possible solution that mitigates this problem. However, most prior approaches in this area assume the presence of a single demonstrator. Leveraging multiple knowledge sources (i.e., advisors) with expertise in distinct aspects of the environment could substantially speed up learning in complex environments. This paper considers the problem of simultaneously learning from multiple independent advisors in multi-agent reinforcement learning. The approach leverages a two-level Q-learning architecture, and extends this framework from single-agent to multi-agent settings. We provide principled algorithms that incorporate a set of advisors by both evaluating the advisors at each state and subsequently using the advisors to guide action selection. We also provide theoretical convergence and sample complexity guarantees. Experimentally, we validate our approach in three different test-beds and show that our algorithms give better performances than baselines, can effectively integrate the combined expertise of different advisors, and learn to ignore bad advice.

* Paper to appear in AAMAS 2023, London, UK

Via

Access Paper or Ask Questions

Multi-Agent Advisor Q-Learning

Nov 08, 2021

Sriram Ganapathi Subramanian, Matthew E. Taylor, Kate Larson, Mark Crowley

Figure 1 for Multi-Agent Advisor Q-Learning

Figure 2 for Multi-Agent Advisor Q-Learning

Figure 3 for Multi-Agent Advisor Q-Learning

Figure 4 for Multi-Agent Advisor Q-Learning

Abstract:In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question which arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online sub-optimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed-point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.

* New version has some typos corrected

Via

Access Paper or Ask Questions

Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments

Nov 01, 2021

Ken Ming Lee, Sriram Ganapathi Subramanian, Mark Crowley

Figure 1 for Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments

Figure 2 for Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments

Figure 3 for Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments

Figure 4 for Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments

Abstract:Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on four PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. We show that in fully-observable environments, independent algorithms can perform on par with multi-agent algorithms in cooperative and competitive settings. For the mixed environments, we show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies. We also show that adding recurrence improves the learning of independent algorithms in cooperative partially observable environments.

* 15 pages, 7 figures, Accepted for NeurIPS 2021 Deep Reinforcement Learning Workshop

Via

Access Paper or Ask Questions

The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning

Mar 07, 2021

Volodymyr Tkachuk, Sriram Ganapathi Subramanian, Matthew E. Taylor

Figure 1 for The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning

Figure 2 for The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning

Abstract:Some reinforcement learning methods suffer from high sample complexity causing them to not be practical in real-world situations. $Q$-function reuse, a transfer learning method, is one way to reduce the sample complexity of learning, potentially improving usefulness of existing algorithms. Prior work has shown the empirical effectiveness of $Q$-function reuse for various environments when applied to model-free algorithms. To the best of our knowledge, there has been no theoretical work showing the regret of $Q$-function reuse when applied to the tabular, model-free setting. We aim to bridge the gap between theoretical and empirical work in $Q$-function reuse by providing some theoretical insights on the effectiveness of $Q$-function reuse when applied to the $Q$-learning with UCB-Hoeffding algorithm. Our main contribution is showing that in a specific case if $Q$-function reuse is applied to the $Q$-learning with UCB-Hoeffding algorithm it has a regret that is independent of the state or action space. We also provide empirical results supporting our theoretical findings.

* 7 pages, 2 figures, submitted to ALA 2021

Via

Access Paper or Ask Questions