modl.ai
Abstract:Artificial intelligence (AI) has enabled agents to master complex video games, from first-person shooters like Counter-Strike to real-time strategy games such as StarCraft II and racing games like Gran Turismo. While these achievements are notable, applying these AI methods in commercial video game production remains challenging due to computational constraints. In commercial scenarios, the majority of computational resources are allocated to 3D rendering, leaving limited capacity for AI methods, which often demand high computational power, particularly those relying on pixel-based sensors. Moreover, the gaming industry prioritizes creating human-like behavior in AI agents to enhance player experience, unlike academic models that focus on maximizing game performance. This paper introduces a novel methodology for training neural networks via imitation learning to play a complex, commercial-standard, VALORANT-like 2v2 tactical shooter game, requiring only modest CPU hardware during inference. Our approach leverages an innovative, pixel-free perception architecture using a small set of ray-cast sensors, which capture essential spatial information efficiently. These sensors allow AI to perform competently without the computational overhead of traditional methods. Models are trained to mimic human behavior using supervised learning on human trajectory data, resulting in realistic and engaging AI agents. Human evaluation tests confirm that our AI agents provide human-like gameplay experiences while operating efficiently under computational constraints. This offers a significant advancement in AI model development for tactical shooter games and possibly other genres.
Abstract:Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. A promising but largely under-explored area is their potential to facilitate human coordination with many agents. Such capabilities would be useful in domains including disaster response, urban planning, and real-time strategy scenarios. In this work, we introduce (1) a real-time strategy game benchmark designed to evaluate these abilities and (2) a novel framework we term HIVE. HIVE empowers a single human to coordinate swarms of up to 2,000 agents using natural language dialog with an LLM. We present promising results on this multi-agent benchmark, with our hybrid approach solving tasks such as coordinating agent movements, exploiting unit weaknesses, leveraging human annotations, and understanding terrain and strategic points. However, our findings also highlight critical limitations of current models, including difficulties in processing spatial visual information and challenges in formulating long-term strategic plans. This work sheds light on the potential and limitations of LLMs in human-swarm coordination, paving the way for future research in this area. The HIVE project page, which includes videos of the system in action, can be found here: hive.syrkis.com.
Abstract:Human culture relies on collective innovation: our ability to continuously explore how existing elements in our environment can be combined to create new ones. Language is hypothesized to play a key role in human culture, driving individual cognitive capacities and shaping communication. Yet the majority of models of collective innovation assign no cognitive capacities or language abilities to agents. Here, we contribute a computational study of collective innovation where agents are Large Language Models (LLMs) that play Little Alchemy 2, a creative video game originally developed for humans that, as we argue, captures useful aspects of innovation landscapes not present in previous test-beds. We, first, study an LLM in isolation and discover that it exhibits both useful skills and crucial limitations. We, then, study groups of LLMs that share information related to their behaviour and focus on the effect of social connectivity on collective performance. In agreement with previous human and computational studies, we observe that groups with dynamic connectivity out-compete fully-connected groups. Our work reveals opportunities and challenges for future studies of collective innovation that are becoming increasingly relevant as Generative Artificial Intelligence algorithms and humans innovate alongside each other.
Abstract:Due to their architecture and how they are trained, artificial neural networks are typically not robust toward pruning, replacing, or shuffling layers at test time. However, such properties would be desirable for different applications, such as distributed neural network architectures where the order of execution cannot be guaranteed or parts of the network can fail during inference. In this work, we address these issues through a number of proposed training approaches for vision transformers whose most important component is randomizing the execution order of attention modules at training time. We show that with our proposed approaches, vision transformers are indeed capable to adapt to arbitrary layer execution orders at test time assuming one tolerates a reduction (about 20\%) in accuracy at the same model size. We also find that our trained models can be randomly merged with each other resulting in functional ("Frankenstein") models without loss of performance compared to the source models. Finally, we layer-prune our models at test time and find that their performance declines gracefully.
Abstract:Biological neural networks are characterized by their high degree of plasticity, a core property that enables the remarkable adaptability of natural organisms. Importantly, this ability affects both the synaptic strength and the topology of the nervous systems. Artificial neural networks, on the other hand, have been mainly designed as static, fully connected structures that can be notoriously brittle in the face of changing environments and novel inputs. Building on previous works on Neural Developmental Programs (NDPs), we propose a class of self-organizing neural networks capable of synaptic and structural plasticity in an activity and reward-dependent manner which we call Lifelong Neural Developmental Program (LNDP). We present an instance of such a network built on the graph transformer architecture and propose a mechanism for pre-experience plasticity based on the spontaneous activity of sensory neurons. Our results demonstrate the ability of the model to learn from experiences in different control tasks starting from randomly connected or empty networks. We further show that structural plasticity is advantageous in environments necessitating fast adaptation or with non-stationary rewards.
Abstract:Representations for black-box optimisation methods (such as evolutionary algorithms) are traditionally constructed using a delicate manual process. This is in contrast to the representation that maps DNAs to phenotypes in biological organisms, which is at the hear of biological complexity and evolvability. Additionally, the core of this process is fundamentally the same across nearly all forms of life, reflecting their shared evolutionary origin. Generative models have shown promise in being learnable representations for black-box optimisation but they are not per se designed to be easily searchable. Here we present a system that can meta-learn such representation by directly optimising for a representation's ability to generate quality-diversity. In more detail, we show our meta-learning approach can find one Neural Cellular Automata, in which cells can attend to different parts of a "DNA" string genome during development, enabling it to grow different solvable 2D maze structures. We show that the evolved genotype-to-phenotype mappings become more and more evolvable, not only resulting in a faster search but also increasing the quality and diversity of grown artefacts.
Abstract:In biological evolution complex neural structures grow from a handful of cellular ingredients. As genomes in nature are bounded in size, this complexity is achieved by a growth process where cells communicate locally to decide whether to differentiate, proliferate and connect with other cells. This self-organisation is hypothesized to play an important part in the generalisation, and robustness of biological neural networks. Artificial neural networks (ANNs), on the other hand, are traditionally optimized in the space of weights. Thus, the benefits and challenges of growing artificial neural networks remain understudied. Building on the previously introduced Neural Developmental Programs (NDP), in this work we present an algorithm for growing ANNs that solve reinforcement learning tasks. We identify a key challenge: ensuring phenotypic complexity requires maintaining neuronal diversity, but this diversity comes at the cost of optimization stability. To address this, we introduce two mechanisms: (a) equipping neurons with an intrinsic state inherited upon neurogenesis; (b) lateral inhibition, a mechanism inspired by biological growth, which controlls the pace of growth, helping diversity persist. We show that both mechanisms contribute to neuronal diversity and that, equipped with them, NDPs achieve comparable results to existing direct and developmental encodings in complex locomotion tasks
Abstract:This Research through Design paper explores how object detection may be applied to a large digital art museum collection to facilitate new ways of encountering and experiencing art. We present the design and evaluation of an interactive application called SMKExplore, which allows users to explore a museum's digital collection of paintings by browsing through objects detected in the images, as a novel form of open-ended exploration. We provide three contributions. First, we show how an object detection pipeline can be integrated into a design process for visual exploration. Second, we present the design and development of an app that enables exploration of an art museum's collection. Third, we offer reflections on future possibilities for museums and HCI researchers to incorporate object detection techniques into the digitalization of museums.
Abstract:Biological nervous systems are created in a fundamentally different way than current artificial neural networks. Despite its impressive results in a variety of different domains, deep learning often requires considerable engineering effort to design high-performing neural architectures. By contrast, biological nervous systems are grown through a dynamic self-organizing process. In this paper, we take initial steps toward neural networks that grow through a developmental process that mirrors key properties of embryonic development in biological organisms. The growth process is guided by another neural network, which we call a Neural Developmental Program (NDP) and which operates through local communication alone. We investigate the role of neural growth on different machine learning benchmarks and different optimization methods (evolutionary training, online RL, offline RL, and supervised learning). Additionally, we highlight future research directions and opportunities enabled by having self-organization driving the growth of neural networks.
Abstract:Models leveraging both visual and textual data such as Contrastive Language-Image Pre-training (CLIP), are increasingly gaining importance. In this work, we show that despite their versatility, such models are vulnerable to what we refer to as fooling master images. Fooling master images are capable of maximizing the confidence score of a CLIP model for a significant number of widely varying prompts, while being unrecognizable for humans. We demonstrate how fooling master images can be mined by searching the latent space of generative models by means of an evolution strategy or stochastic gradient descent. We investigate the properties of the mined fooling master images, and find that images trained on a small number of image captions potentially generalize to a much larger number of semantically related captions. Further, we evaluate two possible mitigation strategies and find that vulnerability to fooling master examples is closely related to a modality gap in contrastive pre-trained multi-modal networks. From the perspective of vulnerability to off-manifold attacks, we therefore argue for the mitigation of modality gaps in CLIP and related multi-modal approaches. Source code and mined CLIPMasterPrints are available at https://github.com/matfrei/CLIPMasterPrints.