Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julian Togelius

modl.ai

Enhancing Player Enjoyment with a Two-Tier DRL and LLM-Based Agent System for Fighting Games

Apr 10, 2025

Shouren Wang, Zehua Jiang, Fernando Sliva, Sam Earle, Julian Togelius

Abstract:Deep reinforcement learning (DRL) has effectively enhanced gameplay experiences and game design across various game genres. However, few studies on fighting game agents have focused explicitly on enhancing player enjoyment, a critical factor for both developers and players. To address this gap and establish a practical baseline for designing enjoyability-focused agents, we propose a two-tier agent (TTA) system and conducted experiments in the classic fighting game Street Fighter II. The first tier of TTA employs a task-oriented network architecture, modularized reward functions, and hybrid training to produce diverse and skilled DRL agents. In the second tier of TTA, a Large Language Model Hyper-Agent, leveraging players' playing data and feedback, dynamically selects suitable DRL opponents. In addition, we investigate and model several key factors that affect the enjoyability of the opponent. The experiments demonstrate improvements from 64. 36% to 156. 36% in the execution of advanced skills over baseline methods. The trained agents also exhibit distinct game-playing styles. Additionally, we conducted a small-scale user study, and the overall enjoyment in the player's feedback validates the effectiveness of our TTA system.

* 15 pages, 8 figures. Submitted to a peer-reviewed conference, under review

Via

Access Paper or Ask Questions

The Procedural Content Generation Benchmark: An Open-source Testbed for Generative Challenges in Games

Mar 27, 2025

Ahmed Khalifa, Roberto Gallotta, Matthew Barthet, Antonios Liapis, Julian Togelius, Georgios N. Yannakakis

Abstract:This paper introduces the Procedural Content Generation Benchmark for evaluating generative algorithms on different game content creation tasks. The benchmark comes with 12 game-related problems with multiple variants on each problem. Problems vary from creating levels of different kinds to creating rule sets for simple arcade games. Each problem has its own content representation, control parameters, and evaluation metrics for quality, diversity, and controllability. This benchmark is intended as a first step towards a standardized way of comparing generative algorithms. We use the benchmark to score three baseline algorithms: a random generator, an evolution strategy, and a genetic algorithm. Results show that some problems are easier to solve than others, as well as the impact the chosen objective has on quality, diversity, and controllability of the generated artifacts.

* 12 pages, 4 figures, 2 tables, published at FDG2025

Via

Access Paper or Ask Questions

Word2Minecraft: Generating 3D Game Levels through Large Language Models

Mar 18, 2025

Shuo Huang, Muhammad Umair Nasir, Steven James, Julian Togelius

Abstract:We present Word2Minecraft, a system that leverages large language models to generate playable game levels in Minecraft based on structured stories. The system transforms narrative elements-such as protagonist goals, antagonist challenges, and environmental settings-into game levels with both spatial and gameplay constraints. We introduce a flexible framework that allows for the customization of story complexity, enabling dynamic level generation. The system employs a scaling algorithm to maintain spatial consistency while adapting key game elements. We evaluate Word2Minecraft using both metric-based and human-based methods. Our results show that GPT-4-Turbo outperforms GPT-4o-Mini in most areas, including story coherence and objective enjoyment, while the latter excels in aesthetic appeal. We also demonstrate the system' s ability to generate levels with high map enjoyment, offering a promising step forward in the intersection of story generation and game design. We open-source the code at https://github.com/JMZ-kk/Word2Minecraft/tree/word2mc_v0

Via

Access Paper or Ask Questions

Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments

Feb 08, 2025

M Charity, Mayu Wilson, Steven Lee, Dipika Rajesh, Sam Earle, Julian Togelius

Figure 1 for Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments

Figure 2 for Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments

Figure 3 for Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments

Figure 4 for Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments

Abstract:This work introduces Amorphous Fortress Online -- a web-based platform where users can design petri-dish-like environments and games consisting of multi-agent AI characters. Users can play, create, and share artificial life and game environments made up of microscopic but transparent finite-state machine agents that interact with each other. The website features multiple interactive editors and accessible settings to view the multi-agent interactions directly from the browser. This system serves to provide a database of thematically diverse AI and game environments that use the emergent behaviors of simple AI agents.

Via

Access Paper or Ask Questions

Human-like Bots for Tactical Shooters Using Compute-Efficient Sensors

Dec 30, 2024

Niels Justesen, Maria Kaselimi, Sam Snodgrass, Miruna Vozaru, Matthew Schlegel, Jonas Wingren, Gabriella A. B. Barros, Tobias Mahlmann, Shyam Sudhakaran, Wesley Kerr(+5 more)

Abstract:Artificial intelligence (AI) has enabled agents to master complex video games, from first-person shooters like Counter-Strike to real-time strategy games such as StarCraft II and racing games like Gran Turismo. While these achievements are notable, applying these AI methods in commercial video game production remains challenging due to computational constraints. In commercial scenarios, the majority of computational resources are allocated to 3D rendering, leaving limited capacity for AI methods, which often demand high computational power, particularly those relying on pixel-based sensors. Moreover, the gaming industry prioritizes creating human-like behavior in AI agents to enhance player experience, unlike academic models that focus on maximizing game performance. This paper introduces a novel methodology for training neural networks via imitation learning to play a complex, commercial-standard, VALORANT-like 2v2 tactical shooter game, requiring only modest CPU hardware during inference. Our approach leverages an innovative, pixel-free perception architecture using a small set of ray-cast sensors, which capture essential spatial information efficiently. These sensors allow AI to perform competently without the computational overhead of traditional methods. Models are trained to mimic human behavior using supervised learning on human trajectory data, resulting in realistic and engaging AI agents. Human evaluation tests confirm that our AI agents provide human-like gameplay experiences while operating efficiently under computational constraints. This offers a significant advancement in AI model development for tactical shooter games and possibly other genres.

Via

Access Paper or Ask Questions

TraSCE: Trajectory Steering for Concept Erasure

Dec 10, 2024

Anubhav Jain, Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, Julian Togelius, Yuki Mitsufuji

Figure 1 for TraSCE: Trajectory Steering for Concept Erasure

Abstract:Recent advancements in text-to-image diffusion models have brought them to the public spotlight, becoming widely accessible and embraced by everyday users. However, these models have been shown to generate harmful content such as not-safe-for-work (NSFW) images. While approaches have been proposed to erase such abstract concepts from the models, jail-breaking techniques have succeeded in bypassing such safety measures. In this paper, we propose TraSCE, an approach to guide the diffusion trajectory away from generating harmful content. Our approach is based on negative prompting, but as we show in this paper, conventional negative prompting is not a complete solution and can easily be bypassed in some corner cases. To address this issue, we first propose a modification of conventional negative prompting. Furthermore, we introduce a localized loss-based guidance that enhances the modified negative prompting technique by steering the diffusion trajectory. We demonstrate that our proposed method achieves state-of-the-art results on various benchmarks in removing harmful content including ones proposed by red teams; and erasing artistic styles and objects. Our proposed approach does not require any training, weight modifications, or training data (both image or prompt), making it easier for model owners to erase new concepts.

Via

Access Paper or Ask Questions

Understanding trade-offs in classifier bias with quality-diversity optimization: an application to talent management

Nov 25, 2024

Catalina M Jaramillo, Paul Squires, Julian Togelius

Figure 1 for Understanding trade-offs in classifier bias with quality-diversity optimization: an application to talent management

Figure 2 for Understanding trade-offs in classifier bias with quality-diversity optimization: an application to talent management

Figure 3 for Understanding trade-offs in classifier bias with quality-diversity optimization: an application to talent management

Figure 4 for Understanding trade-offs in classifier bias with quality-diversity optimization: an application to talent management

Abstract:Fairness,the impartial treatment towards individuals or groups regardless of their inherent or acquired characteristics [20], is a critical challenge for the successful implementation of Artificial Intelligence (AI) in multiple fields like finances, human capital, and housing. A major struggle for the development of fair AI models lies in the bias implicit in the data available to train such models. Filtering or sampling the dataset before training can help ameliorate model bias but can also reduce model performance and the bias impact can be opaque. In this paper, we propose a method for visualizing the biases inherent in a dataset and understanding the potential trade-offs between fairness and accuracy. Our method builds on quality-diversity optimization, in particular Covariance Matrix Adaptation Multi-dimensional Archive of Phenotypic Elites (MAP-Elites). Our method provides a visual representation of bias in models, allows users to identify models within a minimal threshold of fairness, and determines the trade-off between fairness and accuracy.

Via

Access Paper or Ask Questions

Classifier-Free Guidance inside the Attraction Basin May Cause Memorization

Nov 23, 2024

Anubhav Jain, Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, Julian Togelius, Yuki Mitsufuji

Abstract:Diffusion models are prone to exactly reproduce images from the training data. This exact reproduction of the training data is concerning as it can lead to copyright infringement and/or leakage of privacy-sensitive information. In this paper, we present a novel way to understand the memorization phenomenon, and propose a simple yet effective approach to mitigate it. We argue that memorization occurs because of an attraction basin in the denoising process which steers the diffusion trajectory towards a memorized image. However, this can be mitigated by guiding the diffusion trajectory away from the attraction basin by not applying classifier-free guidance until an ideal transition point occurs from which classifier-free guidance is applied. This leads to the generation of non-memorized images that are high in image quality and well-aligned with the conditioning mechanism. To further improve on this, we present a new guidance technique, \emph{opposite guidance}, that escapes the attraction basin sooner in the denoising process. We demonstrate the existence of attraction basins in various scenarios in which memorization occurs, and we show that our proposed approach successfully mitigates memorization.

Via

Access Paper or Ask Questions

GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps

Oct 10, 2024

Muhammad Umair Nasir, Steven James, Julian Togelius

Figure 1 for GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps

Figure 2 for GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps

Figure 3 for GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps

Figure 4 for GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps

Abstract:Large language models (LLMs) have recently demonstrated great success in generating and understanding natural language. While they have also shown potential beyond the domain of natural language, it remains an open question as to what extent and in which way these LLMs can plan. We investigate their planning capabilities by proposing GameTraversalBenchmark (GTB), a benchmark consisting of diverse 2D grid-based game maps. An LLM succeeds if it can traverse through given objectives, with a minimum number of steps and a minimum number of generation errors. We evaluate a number of LLMs on GTB and found that GPT-4-Turbo achieved the highest score of 44.97% on GTB\_Score (GTBS), a composite score that combines the three above criteria. Furthermore, we preliminarily test large reasoning models, namely o1, which scores $67.84\%$ on GTBS, indicating that the benchmark remains challenging for current models. Code, data, and documentation are available at https://github.com/umair-nasir14/Game-Traversal-Benchmark.

* Accepted at 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

Via

Access Paper or Ask Questions

PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Aug 22, 2024

Sam Earle, Zehua Jiang, Julian Togelius

Figure 1 for PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Figure 2 for PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Figure 3 for PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Figure 4 for PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Abstract:Procedural Content Generation via Reinforcement Learning (PCGRL) has been introduced as a means by which controllable designer agents can be trained based only on a set of computable metrics acting as a proxy for the level's quality and key characteristics. While PCGRL offers a unique set of affordances for game designers, it is constrained by the compute-intensive process of training RL agents, and has so far been limited to generating relatively small levels. To address this issue of scale, we implement several PCGRL environments in Jax so that all aspects of learning and simulation happen in parallel on the GPU, resulting in faster environment simulation; removing the CPU-GPU transfer of information bottleneck during RL training; and ultimately resulting in significantly improved training speed. We replicate several key results from prior works in this new framework, letting models train for much longer than previously studied, and evaluating their behavior after 1 billion timesteps. Aiming for greater control for human designers, we introduce randomized level sizes and frozen "pinpoints" of pivotal game tiles as further ways of countering overfitting. To test the generalization ability of learned generators, we evaluate models on large, out-of-distribution map sizes, and find that partial observation sizes learn more robust design strategies.

* 8 pages, 7 figures, 6 tables. Published at IEEE Conference on Games, 2024

Via

Access Paper or Ask Questions