Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seth Karten

PokéChamp: an Expert-level Minimax Language Agent

Mar 06, 2025

Seth Karten, Andy Luu Nguyen, Chi Jin

Abstract:We introduce Pok\'eChamp, a minimax agent powered by Large Language Models (LLMs) for Pok\'emon battles. Built on a general framework for two-player competitive games, Pok\'eChamp leverages the generalist capabilities of LLMs to enhance minimax tree search. Specifically, LLMs replace three key modules: (1) player action sampling, (2) opponent modeling, and (3) value function estimation, enabling the agent to effectively utilize gameplay history and human knowledge to reduce the search space and address partial observability. Notably, our framework requires no additional LLM training. We evaluate Pok\'eChamp in the popular Gen 9 OU format. When powered by GPT-4o, it achieves a win rate of 76% against the best existing LLM-based bot and 84% against the strongest rule-based bot, demonstrating its superior performance. Even with an open-source 8-billion-parameter Llama 3.1 model, Pok\'eChamp consistently outperforms the previous best LLM-based bot, Pok\'ellmon powered by GPT-4o, with a 64% win rate. Pok\'eChamp attains a projected Elo of 1300-1500 on the Pok\'emon Showdown online ladder, placing it among the top 30%-10% of human players. In addition, this work compiles the largest real-player Pok\'emon battle dataset, featuring over 3 million games, including more than 500k high-Elo matches. Based on this dataset, we establish a series of battle benchmarks and puzzles to evaluate specific battling skills. We further provide key updates to the local game engine. We hope this work fosters further research that leverage Pok\'emon battle as benchmark to integrate LLM technologies with game-theoretic algorithms addressing general multiagent problems. Videos, code, and dataset available at https://sites.google.com/view/pokechamp-llm.

* 24 pages, 13 figures

Via

Access Paper or Ask Questions

FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

Jun 04, 2024

Wenzhe Li, Zihan Ding, Seth Karten, Chi Jin

Figure 1 for FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

Figure 2 for FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

Figure 3 for FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

Figure 4 for FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

Abstract:Recent advances in reinforcement learning (RL) heavily rely on a variety of well-designed benchmarks, which provide environmental platforms and consistent criteria to evaluate existing and novel algorithms. Specifically, in multi-agent RL (MARL), a plethora of benchmarks based on cooperative games have spurred the development of algorithms that improve the scalability of cooperative multi-agent systems. However, for the competitive setting, a lightweight and open-sourced benchmark with challenging gaming dynamics and visual inputs has not yet been established. In this work, we present FightLadder, a real-time fighting game platform, to empower competitive MARL research. Along with the platform, we provide implementations of state-of-the-art MARL algorithms for competitive games, as well as a set of evaluation metrics to characterize the performance and exploitability of agents. We demonstrate the feasibility of this platform by training a general agent that consistently defeats 12 built-in characters in single-player mode, and expose the difficulty of training a non-exploitable agent without human knowledge and demonstrations in two-player mode. FightLadder provides meticulously designed environments to address critical challenges in competitive MARL research, aiming to catalyze a new era of discovery and advancement in the field. Videos and code at https://sites.google.com/view/fightladder/home.

* ICML 2024

Via

Access Paper or Ask Questions

On the Role of Emergent Communication for Social Learning in Multi-Agent Reinforcement Learning

Feb 28, 2023

Seth Karten, Siva Kailas, Huao Li, Katia Sycara

Abstract:Explicit communication among humans is key to coordinating and learning. Social learning, which uses cues from experts, can greatly benefit from the usage of explicit communication to align heterogeneous policies, reduce sample complexity, and solve partially observable tasks. Emergent communication, a type of explicit communication, studies the creation of an artificial language to encode a high task-utility message directly from data. However, in most cases, emergent communication sends insufficiently compressed messages with little or null information, which also may not be understandable to a third-party listener. This paper proposes an unsupervised method based on the information bottleneck to capture both referential complexity and task-specific utility to adequately explore sparse social communication scenarios in multi-agent reinforcement learning (MARL). We show that our model is able to i) develop a natural-language-inspired lexicon of messages that is independently composed of a set of emergent concepts, which span the observations and intents with minimal bits, ii) develop communication to align the action policies of heterogeneous agents with dissimilar feature models, and iii) learn a communication policy from watching an expert's action policy, which we term `social shadowing'.

* 14 pages, 5 figures

Via

Access Paper or Ask Questions

Towards True Lossless Sparse Communication in Multi-Agent Systems

Nov 30, 2022

Seth Karten, Mycal Tucker, Siva Kailas, Katia Sycara

Abstract:Communication enables agents to cooperate to achieve their goals. Learning when to communicate, i.e., sparse (in time) communication, and whom to message is particularly important when bandwidth is limited. Recent work in learning sparse individualized communication, however, suffers from high variance during training, where decreasing communication comes at the cost of decreased reward, particularly in cooperative tasks. We use the information bottleneck to reframe sparsity as a representation learning problem, which we show naturally enables lossless sparse communication at lower budgets than prior art. In this paper, we propose a method for true lossless sparsity in communication via Information Maximizing Gated Sparse Multi-Agent Communication (IMGS-MAC). Our model uses two individualized regularization objectives, an information maximization autoencoder and sparse communication loss, to create informative and sparse communication. We evaluate the learned communication `language' through direct causal analysis of messages in non-sparse runs to determine the range of lossless sparse budgets, which allow zero-shot sparsity, and the range of sparse budgets that will inquire a reward loss, which is minimized by our learned gating function with few-shot sparsity. To demonstrate the efficacy of our results, we experiment in cooperative multi-agent tasks where communication is essential for success. We evaluate our model with both continuous and discrete messages. We focus our analysis on a variety of ablations to show the effect of message representations, including their properties, and lossless performance of our model.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

Probe-Based Interventions for Modifying Agent Behavior

Jan 26, 2022

Mycal Tucker, William Kuhl, Khizer Shahid, Seth Karten, Katia Sycara, Julie Shah

Abstract:Neural nets are powerful function approximators, but the behavior of a given neural net, once trained, cannot be easily modified. We wish, however, for people to be able to influence neural agents' actions despite the agents never training with humans, which we formalize as a human-assisted decision-making problem. Inspired by prior art initially developed for model explainability, we develop a method for updating representations in pre-trained neural nets according to externally-specified properties. In experiments, we show how our method may be used to improve human-agent team performance for a variety of neural networks from image classifiers to agents in multi-agent reinforcement learning settings.

Via

Access Paper or Ask Questions

The Enforcers: Consistent Sparse-Discrete Methods for Constraining Informative Emergent Communication

Jan 19, 2022

Seth Karten, Siddharth Agrawal, Mycal Tucker, Dana Hughes, Michael Lewis, Julie Shah, Katia Sycara

Figure 1 for The Enforcers: Consistent Sparse-Discrete Methods for Constraining Informative Emergent Communication

Figure 2 for The Enforcers: Consistent Sparse-Discrete Methods for Constraining Informative Emergent Communication

Figure 3 for The Enforcers: Consistent Sparse-Discrete Methods for Constraining Informative Emergent Communication

Figure 4 for The Enforcers: Consistent Sparse-Discrete Methods for Constraining Informative Emergent Communication

Abstract:Communication enables agents to cooperate to achieve their goals. Learning when to communicate, i.e. sparse communication, is particularly important where bandwidth is limited, in situations where agents interact with humans, in partially observable scenarios where agents must convey information unavailable to others, and in non-cooperative scenarios where agents may hide information to gain a competitive advantage. Recent work in learning sparse communication, however, suffers from high variance training where, the price of decreasing communication is a decrease in reward, particularly in cooperative tasks. Sparse communications are necessary to match agent communication to limited human bandwidth. Humans additionally communicate via discrete linguistic tokens, previously shown to decrease task performance when compared to continuous communication vectors. This research addresses the above issues by limiting the loss in reward of decreasing communication and eliminating the penalty for discretization. In this work, we successfully constrain training using a learned gate to regulate when to communicate while using discrete prototypes that reflect what to communicate for cooperative tasks with partial observability. We provide two types of "Enforcers" for hard and soft budget constraints and present results of communication under different budgets. We show that our method satisfies constraints while yielding the same performance as comparable, unconstrained methods.

* Submitted to IJCAI 2022

Via

Access Paper or Ask Questions

Data-Efficient Learning of High-Quality Controls for Kinodynamic Planning used in Vehicular Navigation

Jan 06, 2022

Seth Karten, Aravind Sivaramakrishnan, Edgar Granados, Troy McMahon, Kostas E. Bekris

Figure 1 for Data-Efficient Learning of High-Quality Controls for Kinodynamic Planning used in Vehicular Navigation

Figure 2 for Data-Efficient Learning of High-Quality Controls for Kinodynamic Planning used in Vehicular Navigation

Figure 3 for Data-Efficient Learning of High-Quality Controls for Kinodynamic Planning used in Vehicular Navigation

Figure 4 for Data-Efficient Learning of High-Quality Controls for Kinodynamic Planning used in Vehicular Navigation

Abstract:This paper aims to improve the path quality and computational efficiency of kinodynamic planners used for vehicular systems. It proposes a learning framework for identifying promising controls during the expansion process of sampling-based motion planners for systems with dynamics. Offline, the learning process is trained to return the highest-quality control that reaches a local goal state (i.e., a waypoint) in the absence of obstacles from an input difference vector between its current state and a local goal state. The data generation scheme provides bounds on the target dispersion and uses state space pruning to ensure high-quality controls. By focusing on the system's dynamics, this process is data efficient and takes place once for a dynamical system, so that it can be used for different environments with modular expansion functions. This work integrates the proposed learning process with a) an exploratory expansion function that generates waypoints with biased coverage over the reachable space, and b) proposes an exploitative expansion function for mobile robots, which generates waypoints using medial axis information. This paper evaluates the learning process and the corresponding planners for a first and second-order differential drive systems. The results show that the proposed integration of learning and planning can produce better quality paths than kinodynamic planning with random controls in fewer iterations and computation time.

* Machine Learning for Motion Planning (MLMP) Workshop at ICRA 2021, Xi'an, China
* Presented at the Machine Learning for Motion Planning (MLMP) Workshop at ICRA 2021, Xi'an, China

Via

Access Paper or Ask Questions

Improving Kinodynamic Planners for Vehicular Navigation with Learned Goal-Reaching Controllers

Oct 08, 2021

Aravind Sivaramakrishnan, Edgar Granados, Seth Karten, Troy McMahon, Kostas E. Bekris

Figure 1 for Improving Kinodynamic Planners for Vehicular Navigation with Learned Goal-Reaching Controllers

Figure 2 for Improving Kinodynamic Planners for Vehicular Navigation with Learned Goal-Reaching Controllers

Figure 3 for Improving Kinodynamic Planners for Vehicular Navigation with Learned Goal-Reaching Controllers

Figure 4 for Improving Kinodynamic Planners for Vehicular Navigation with Learned Goal-Reaching Controllers

Abstract:This paper aims to improve the path quality and computational efficiency of sampling-based kinodynamic planners for vehicular navigation. It proposes a learning framework for identifying promising controls during the expansion process of sampling-based planners. Given a dynamics model, a reinforcement learning process is trained offline to return a low-cost control that reaches a local goal state (i.e., a waypoint) in the absence of obstacles. By focusing on the system's dynamics and not knowing the environment, this process is data-efficient and takes place once for a robotic system. In this way, it can be reused in different environments. The planner generates online local goal states for the learned controller in an informed manner to bias towards the goal and consecutively in an exploratory, random manner. For the informed expansion, local goal states are generated either via (a) medial axis information in environments with obstacles, or (b) wavefront information for setups with traversability costs. The learning process and the resulting planning framework are evaluated for a first and second-order differential drive system, as well as a physically simulated Segway robot. The results show that the proposed integration of learning and planning can produce higher quality paths than sampling-based kinodynamic planning with random controls in fewer iterations and computation time.

Via

Access Paper or Ask Questions