Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jenny Zhang

Meng Yee

Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents

May 29, 2025

Jenny Zhang, Shengran Hu, Cong Lu, Robert Lange, Jeff Clune

Abstract:Today's AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The advance of AI could itself be automated. If done safely, that would accelerate AI development and allow us to reap its benefits much sooner. Meta-learning can automate the discovery of novel algorithms, but is limited by first-order improvements and the human design of a suitable search space. The G\"odel machine proposed a theoretical alternative: a self-improving AI that repeatedly modifies itself in a provably beneficial manner. Unfortunately, proving that most changes are net beneficial is impossible in practice. We introduce the Darwin G\"odel Machine (DGM), a self-improving system that iteratively modifies its own code (thereby also improving its ability to modify its own codebase) and empirically validates each change using coding benchmarks. Inspired by Darwinian evolution and open-endedness research, the DGM maintains an archive of generated coding agents. It grows the archive by sampling an agent from it and using a foundation model to create a new, interesting, version of the sampled agent. This open-ended exploration forms a growing tree of diverse, high-quality agents and allows the parallel exploration of many different paths through the search space. Empirically, the DGM automatically improves its coding capabilities (e.g., better code editing tools, long-context window management, peer-review mechanisms), increasing performance on SWE-bench from 20.0% to 50.0%, and on Polyglot from 14.2% to 30.7%. Furthermore, the DGM significantly outperforms baselines without self-improvement or open-ended exploration. All experiments were done with safety precautions (e.g., sandboxing, human oversight). The DGM is a significant step toward self-improving AI, capable of gathering its own stepping stones along paths that unfold into endless innovation.

* Code at https://github.com/jennyzzt/dgm

Via

Access Paper or Ask Questions

OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

May 24, 2024

Maxence Faldor, Jenny Zhang, Antoine Cully, Jeff Clune

Abstract:Open-ended and AI-generating algorithms aim to continuously generate and solve increasingly complex tasks indefinitely, offering a promising path toward more general intelligence. To accomplish this grand vision, learning must occur within a vast array of potential tasks. Existing approaches to automatically generating environments are constrained within manually predefined, often narrow distributions of environment, limiting their ability to create any learning environment. To address this limitation, we introduce a novel framework, OMNI-EPIC, that augments previous work in Open-endedness via Models of human Notions of Interestingness (OMNI) with Environments Programmed in Code (EPIC). OMNI-EPIC leverages foundation models to autonomously generate code specifying the next learnable (i.e., not too easy or difficult for the agent's current skill set) and interesting (e.g., worthwhile and novel) tasks. OMNI-EPIC generates both environments (e.g., an obstacle course) and reward functions (e.g., progress through the obstacle course quickly without touching red objects), enabling it, in principle, to create any simulatable learning task. We showcase the explosive creativity of OMNI-EPIC, which continuously innovates to suggest new, interesting learning challenges. We also highlight how OMNI-EPIC can adapt to reinforcement learning agents' learning progress, generating tasks that are of suitable difficulty. Overall, OMNI-EPIC can endlessly create learnable and interesting environments, further propelling the development of self-improving AI systems and AI-Generating Algorithms. Project website with videos: https://dub.sh/omniepic

Via

Access Paper or Ask Questions

Learning Emergent Gaits with Decentralized Phase Oscillators: on the role of Observations, Rewards, and Feedback

Feb 17, 2024

Jenny Zhang, Steve Heim, Se Hwan Jeon, Sangbae Kim

Abstract:We present a minimal phase oscillator model for learning quadrupedal locomotion. Each of the four oscillators is coupled only to itself and its corresponding leg through local feedback of the ground reaction force, which can be interpreted as an observer feedback gain. We interpret the oscillator itself as a latent contact state-estimator. Through a systematic ablation study, we show that the combination of phase observations, simple phase-based rewards, and the local feedback dynamics induces policies that exhibit emergent gait preferences, while using a reduced set of simple rewards, and without prescribing a specific gait. The code is open-source, and a video synopsis available at https://youtu.be/1NKQ0rSV3jU.

* ICRA 2024, 8 pages 7 Figures

Via

Access Paper or Ask Questions

Quality-Diversity through AI Feedback

Oct 31, 2023

Herbie Bradley, Andrew Dai, Hannah Teufel, Jenny Zhang, Koen Oostermeijer, Marco Bellagente, Jeff Clune, Kenneth Stanley, Grégory Schott, Joel Lehman

Figure 1 for Quality-Diversity through AI Feedback

Figure 2 for Quality-Diversity through AI Feedback

Figure 3 for Quality-Diversity through AI Feedback

Figure 4 for Quality-Diversity through AI Feedback

Abstract:In many text-generation problems, users may prefer not only a single response, but a diverse range of high-quality outputs from which to choose. Quality-diversity (QD) search algorithms aim at such outcomes, by continually improving and diversifying a population of candidates. However, the applicability of QD to qualitative domains, like creative writing, has been limited by the difficulty of algorithmically specifying measures of quality and diversity. Interestingly, recent developments in language models (LMs) have enabled guiding search through AI feedback, wherein LMs are prompted in natural language to evaluate qualitative aspects of text. Leveraging this development, we introduce Quality-Diversity through AI Feedback (QDAIF), wherein an evolutionary algorithm applies LMs to both generate variation and evaluate the quality and diversity of candidate text. When assessed on creative writing domains, QDAIF covers more of a specified search space with high-quality samples than do non-QD controls. Further, human evaluation of QDAIF-generated creative texts validates reasonable agreement between AI and human evaluation. Our results thus highlight the potential of AI feedback to guide open-ended search for creative and original solutions, providing a recipe that seemingly generalizes to many domains and modalities. In this way, QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve, which are among the core skills underlying human society's capacity for innovation.

* minor addition edits for improved clarity

Via

Access Paper or Ask Questions

Quality Diversity through Human Feedback

Oct 18, 2023

Li Ding, Jenny Zhang, Jeff Clune, Lee Spector, Joel Lehman

Abstract:Reinforcement learning from human feedback (RLHF) has exhibited the potential to enhance the performance of foundation models for qualitative tasks. Despite its promise, its efficacy is often restricted when conceptualized merely as a mechanism to maximize learned reward models of averaged human preferences, especially in areas such as image generation which demand diverse model responses. Meanwhile, quality diversity (QD) algorithms, dedicated to seeking diverse, high-quality solutions, are often constrained by the dependency on manually defined diversity metrics. Interestingly, such limitations of RLHF and QD can be overcome by blending insights from both. This paper introduces Quality Diversity through Human Feedback (QDHF), which employs human feedback for inferring diversity metrics, expanding the applicability of QD algorithms. Empirical results reveal that QDHF outperforms existing QD methods regarding automatic diversity discovery, and matches the search capabilities of QD with human-constructed metrics. Notably, when deployed for a latent space illumination task, QDHF markedly enhances the diversity of images generated by a Diffusion model. The study concludes with an in-depth analysis of QDHF's sample efficiency and the quality of its derived diversity metrics, emphasizing its promise for enhancing exploration and diversity in optimization for complex, open-ended tasks.

Via

Access Paper or Ask Questions

OMNI: Open-endedness via Models of human Notions of Interestingness

Jun 02, 2023

Jenny Zhang, Joel Lehman, Kenneth Stanley, Jeff Clune

Abstract:Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast environment search space, but there are thus infinitely many possible tasks. Even after filtering for tasks the current agent can learn (i.e., learning progress), countless learnable yet uninteresting tasks remain (e.g., minor variations of previously learned tasks). An Achilles Heel of open-endedness research is the inability to quantify (and thus prioritize) tasks that are not just learnable, but also $\textit{interesting}$ (e.g., worthwhile and novel). We propose solving this problem by $\textit{Open-endedness via Models of human Notions of Interestingness}$ (OMNI). The insight is that we can utilize large (language) models (LMs) as a model of interestingness (MoI), because they $\textit{already}$ internalize human concepts of interestingness from training on vast amounts of human-generated data, where humans naturally write about what they find interesting or boring. We show that LM-based MoIs improve open-ended learning by focusing on tasks that are both learnable $\textit{and interesting}$, outperforming baselines based on uniform task sampling or learning progress alone. This approach has the potential to dramatically advance the ability to intelligently select which tasks to focus on next (i.e., auto-curricula), and could be seen as AI selecting its own next task to learn, facilitating self-improving AI and AI-Generating Algorithms.

* 33 pages, 22 figures

Via

Access Paper or Ask Questions

Perceptive Locomotion with Controllable Pace and Natural Gait Transitions Over Uneven Terrains

Jan 30, 2023

Daniel Chee Hian Tan, Jenny Zhang, Michael, Chuah, Zhibin Li

Figure 1 for Perceptive Locomotion with Controllable Pace and Natural Gait Transitions Over Uneven Terrains

Figure 2 for Perceptive Locomotion with Controllable Pace and Natural Gait Transitions Over Uneven Terrains

Figure 3 for Perceptive Locomotion with Controllable Pace and Natural Gait Transitions Over Uneven Terrains

Figure 4 for Perceptive Locomotion with Controllable Pace and Natural Gait Transitions Over Uneven Terrains

Abstract:This work developed a learning framework for perceptive legged locomotion that combines visual feedback, proprioceptive information, and active gait regulation of foot-ground contacts. The perception requires only one forward-facing camera to obtain the heightmap, and the active regulation of gait paces and traveling velocity are realized through our formulation of CPG-based high-level imitation of foot-ground contacts. Through this framework, an end-user has the ability to command task-level inputs to control different walking speeds and gait frequencies according to the traversal of different terrains, which enables more reliable negotiation with encountered obstacles. The results demonstrated that the learned perceptive locomotion policy followed task-level control inputs with intended behaviors, and was robust in presence of unseen terrains and external force perturbations. A video demonstration can be found at https://youtu.be/OTzlWzDfAe8, and the codebase at https://github.com/jennyzzt/perceptual-locomotion.

Via

Access Paper or Ask Questions

Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation

Jun 20, 2022

Jenny Zhang, Samson Yu, Jiafei Duan, Cheston Tan

Figure 1 for Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation

Figure 2 for Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation

Figure 3 for Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation

Figure 4 for Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation

Abstract:In reality, it is often more efficient to ask for help than to search the entire space to find an object with an unknown location. We present a learning framework that enables an agent to actively ask for help in such embodied visual navigation tasks, where the feedback informs the agent of where the goal is in its view. To emulate the real-world scenario that a teacher may not always be present, we propose a training curriculum where feedback is not always available. We formulate an uncertainty measure of where the goal is and use empirical results to show that through this approach, the agent learns to ask for help effectively while remaining robust when feedback is not available.

* 11 Pages, 3 Figures

Via

Access Paper or Ask Questions