Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Steven James

Word2Minecraft: Generating 3D Game Levels through Large Language Models

Mar 18, 2025

Shuo Huang, Muhammad Umair Nasir, Steven James, Julian Togelius

Abstract:We present Word2Minecraft, a system that leverages large language models to generate playable game levels in Minecraft based on structured stories. The system transforms narrative elements-such as protagonist goals, antagonist challenges, and environmental settings-into game levels with both spatial and gameplay constraints. We introduce a flexible framework that allows for the customization of story complexity, enabling dynamic level generation. The system employs a scaling algorithm to maintain spatial consistency while adapting key game elements. We evaluate Word2Minecraft using both metric-based and human-based methods. Our results show that GPT-4-Turbo outperforms GPT-4o-Mini in most areas, including story coherence and objective enjoyment, while the latter excels in aesthetic appeal. We also demonstrate the system' s ability to generate levels with high map enjoyment, offering a promising step forward in the intersection of story generation and game design. We open-source the code at https://github.com/JMZ-kk/Word2Minecraft/tree/word2mc_v0

Via

Access Paper or Ask Questions

GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps

Oct 10, 2024

Muhammad Umair Nasir, Steven James, Julian Togelius

Figure 1 for GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps

Figure 2 for GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps

Figure 3 for GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps

Figure 4 for GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps

Abstract:Large language models (LLMs) have recently demonstrated great success in generating and understanding natural language. While they have also shown potential beyond the domain of natural language, it remains an open question as to what extent and in which way these LLMs can plan. We investigate their planning capabilities by proposing GameTraversalBenchmark (GTB), a benchmark consisting of diverse 2D grid-based game maps. An LLM succeeds if it can traverse through given objectives, with a minimum number of steps and a minimum number of generation errors. We evaluate a number of LLMs on GTB and found that GPT-4-Turbo achieved the highest score of 44.97% on GTB\_Score (GTBS), a composite score that combines the three above criteria. Furthermore, we preliminarily test large reasoning models, namely o1, which scores $67.84\%$ on GTBS, indicating that the benchmark remains challenging for current models. Code, data, and documentation are available at https://github.com/umair-nasir14/Game-Traversal-Benchmark.

* Accepted at 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

Via

Access Paper or Ask Questions

Word2World: Generating Stories and Worlds through Large Language Models

May 06, 2024

Muhammad U. Nasir, Steven James, Julian Togelius

Abstract:Large Language Models (LLMs) have proven their worth across a diverse spectrum of disciplines. LLMs have shown great potential in Procedural Content Generation (PCG) as well, but directly generating a level through a pre-trained LLM is still challenging. This work introduces Word2World, a system that enables LLMs to procedurally design playable games through stories, without any task-specific fine-tuning. Word2World leverages the abilities of LLMs to create diverse content and extract information. Combining these abilities, LLMs can create a story for the game, design narrative, and place tiles in appropriate places to create coherent worlds and playable games. We test Word2World with different LLMs and perform a thorough ablation study to validate each step. We open-source the code at https://github.com/umair-nasir14/Word2World.

Via

Access Paper or Ask Questions

MinePlanner: A Benchmark for Long-Horizon Planning in Large Minecraft Worlds

Dec 20, 2023

William Hill, Ireton Liu, Anita De Mello Koch, Damion Harvey, George Konidaris, Steven James

Abstract:We propose a new benchmark for planning tasks based on the Minecraft game. Our benchmark contains 45 tasks overall, but also provides support for creating both propositional and numeric instances of new Minecraft tasks automatically. We benchmark numeric and propositional planning systems on these tasks, with results demonstrating that state-of-the-art planners are currently incapable of dealing with many of the challenges advanced by our new benchmark, such as scaling to instances with thousands of objects. Based on these results, we identify areas of improvement for future planners. Our framework is made available at https://github.com/IretonLiu/mine-pddl/.

Via

Access Paper or Ask Questions

Counting Reward Automata: Sample Efficient Reinforcement Learning Through the Exploitation of Reward Function Structure

Dec 18, 2023

Tristan Bester, Benjamin Rosman, Steven James, Geraud Nangue Tasse

Abstract:We present counting reward automata-a finite state machine variant capable of modelling any reward function expressible as a formal language. Unlike previous approaches, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by unrestricted grammars. We prove that an agent equipped with such an abstract machine is able to solve a larger set of tasks than those utilising current approaches. We show that this increase in expressive power does not come at the cost of increased automaton complexity. A selection of learning algorithms are presented which exploit automaton structure to improve sample efficiency. We show that the state machines required in our formulation can be specified from natural language task descriptions using large language models. Empirical results demonstrate that our method outperforms competing approaches in terms of sample efficiency, automaton complexity, and task completion.

* 14 pages, 11 Figures, Submitted to AAAI W25: Neuro-Symbolic Learning and Reasoning in the era of Large Language Models (NuCLeaR)

Via

Access Paper or Ask Questions

Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies

Oct 25, 2023

Michael Beukman, Devon Jarvis, Richard Klein, Steven James, Benjamin Rosman

Figure 1 for Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies

Figure 2 for Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies

Figure 3 for Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies

Figure 4 for Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies

Abstract:While reinforcement learning has achieved remarkable successes in several domains, its real-world application is limited due to many methods failing to generalise to unfamiliar conditions. In this work, we consider the problem of generalising to new transition dynamics, corresponding to cases in which the environment's response to the agent's actions differs. For example, the gravitational force exerted on a robot depends on its mass and changes the robot's mobility. Consequently, in such cases, it is necessary to condition an agent's actions on extrinsic state information and pertinent contextual information reflecting how the environment responds. While the need for context-sensitive policies has been established, the manner in which context is incorporated architecturally has received less attention. Thus, in this work, we present an investigation into how context information should be incorporated into behaviour learning to improve generalisation. To this end, we introduce a neural network architecture, the Decision Adapter, which generates the weights of an adapter module and conditions the behaviour of an agent on the context information. We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance compared to previous approaches in several environments. Beyond this, the Decision Adapter is more robust to irrelevant distractor variables than several alternative methods.

* Accepted to NeurIPS 2023

Via

Access Paper or Ask Questions

LLMatic: Neural Architecture Search via Large Language Models and Quality-Diversity Optimization

Jun 01, 2023

Muhammad U. Nasir, Sam Earle, Julian Togelius, Steven James, Christopher Cleghorn

Figure 1 for LLMatic: Neural Architecture Search via Large Language Models and Quality-Diversity Optimization

Figure 2 for LLMatic: Neural Architecture Search via Large Language Models and Quality-Diversity Optimization

Figure 3 for LLMatic: Neural Architecture Search via Large Language Models and Quality-Diversity Optimization

Figure 4 for LLMatic: Neural Architecture Search via Large Language Models and Quality-Diversity Optimization

Abstract:Large Language Models (LLMs) have emerged as powerful tools capable of accomplishing a broad spectrum of tasks. Their abilities span numerous areas, and one area where they have made a significant impact is in the domain of code generation. In this context, we view LLMs as mutation and crossover tools. Meanwhile, Quality-Diversity (QD) algorithms are known to discover diverse and robust solutions. By merging the code-generating abilities of LLMs with the diversity and robustness of QD solutions, we introduce LLMatic, a Neural Architecture Search (NAS) algorithm. While LLMs struggle to conduct NAS directly through prompts, LLMatic uses a procedural approach, leveraging QD for prompts and network architecture to create diverse and highly performant networks. We test LLMatic on the CIFAR-10 image classification benchmark, demonstrating that it can produce competitive networks with just $2,000$ searches, even without prior knowledge of the benchmark domain or exposure to any previous top-performing models for the benchmark.

Via

Access Paper or Ask Questions

ROSARL: Reward-Only Safe Reinforcement Learning

May 31, 2023

Geraud Nangue Tasse, Tamlin Love, Mark Nemecek, Steven James, Benjamin Rosman

Abstract:An important problem in reinforcement learning is designing agents that learn to solve tasks safely in an environment. A common solution is for a human expert to define either a penalty in the reward function or a cost to be minimised when reaching unsafe states. However, this is non-trivial, since too small a penalty may lead to agents that reach unsafe states, while too large a penalty increases the time to convergence. Additionally, the difficulty in designing reward or cost functions can increase with the complexity of the problem. Hence, for a given environment with a given set of unsafe states, we are interested in finding the upper bound of rewards at unsafe states whose optimal policies minimise the probability of reaching those unsafe states, irrespective of task rewards. We refer to this exact upper bound as the "Minmax penalty", and show that it can be obtained by taking into account both the controllability and diameter of an environment. We provide a simple practical model-free algorithm for an agent to learn this Minmax penalty while learning the task policy, and demonstrate that using it leads to agents that learn safe policies in high-dimensional continuous control environments.

Via

Access Paper or Ask Questions

Hierarchically Composing Level Generators for the Creation of Complex Structures

Feb 03, 2023

Michael Beukman, Manuel Fokam, Marcel Kruger, Guy Axelrod, Muhammad Nasir, Branden Ingram, Benjamin Rosman, Steven James

Figure 1 for Hierarchically Composing Level Generators for the Creation of Complex Structures

Figure 2 for Hierarchically Composing Level Generators for the Creation of Complex Structures

Figure 3 for Hierarchically Composing Level Generators for the Creation of Complex Structures

Figure 4 for Hierarchically Composing Level Generators for the Creation of Complex Structures

Abstract:Procedural content generation (PCG) is a growing field, with numerous applications in the video game industry, and great potential to help create better games at a fraction of the cost of manual creation. However, much of the work in PCG is focused on generating relatively straightforward levels in simple games, as it is challenging to design an optimisable objective function for complex settings. This limits the applicability of PCG to more complex and modern titles, hindering its adoption in industry. Our work aims to address this limitation by introducing a compositional level generation method, which recursively composes simple, low-level generators together to construct large and complex creations. This approach allows for easily-optimisable objectives and the ability to design a complex structure in an interpretable way by referencing lower-level components. We empirically demonstrate that our method outperforms a non-compositional baseline by more accurately satisfying a designer's functional requirements in several tasks. Finally, we provide a qualitative showcase (in Minecraft) illustrating the large and complex, but still coherent, structures that were generated using simple base generators.

* Code is available at https://github.com/Michael-Beukman/MCHAMR. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Augmentative Topology Agents For Open-Ended Learning

Oct 20, 2022

Muhammad Umair Nasir, Michael Beukman, Steven James, Christopher Wesley Cleghorn

Figure 1 for Augmentative Topology Agents For Open-Ended Learning

Figure 2 for Augmentative Topology Agents For Open-Ended Learning

Figure 3 for Augmentative Topology Agents For Open-Ended Learning

Figure 4 for Augmentative Topology Agents For Open-Ended Learning

Abstract:In this work, we tackle the problem of open-ended learning by introducing a method that simultaneously evolves agents and increasingly challenging environments. Unlike previous open-ended approaches that optimize agents using a fixed neural network topology, we hypothesize that generalization can be improved by allowing agents' controllers to become more complex as they encounter more difficult environments. Our method, Augmentative Topology EPOET (ATEP), extends the Enhanced Paired Open-Ended Trailblazer (EPOET) algorithm by allowing agents to evolve their own neural network structures over time, adding complexity and capacity as necessary. Empirical results demonstrate that ATEP results in general agents capable of solving more environments than a fixed-topology baseline. We also investigate mechanisms for transferring agents between environments and find that a species-based approach further improves the performance and generalization of agents.

* Accepted to Life-long Learning of High-level Cognitive and Reasoning Skills Workshop at IROS 2022

Via

Access Paper or Ask Questions