Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Emily Dolson

The Robustness of Structural Features in Species Interaction Networks

Feb 24, 2025

Sanaz Hasanzadeh Fard, Emily Dolson

Abstract:Species interaction networks are a powerful tool for describing ecological communities; they typically contain nodes representing species, and edges representing interactions between those species. For the purposes of drawing abstract inferences about groups of similar networks, ecologists often use graph topology metrics to summarize structural features. However, gathering the data that underlies these networks is challenging, which can lead to some interactions being missed. Thus, it is important to understand how much different structural metrics are affected by missing data. To address this question, we analyzed a database of 148 real-world bipartite networks representing four different types of species interactions (pollination, host-parasite, plant-ant, and seed-dispersal). For each network, we measured six different topological properties: number of connected components, variance in node betweenness, variance in node PageRank, largest Eigenvalue, the number of non-zero Eigenvalues, and community detection as determined by four different algorithms. We then tested how these properties change as additional edges -- representing data that may have been missed -- are added to the networks. We found substantial variation in how robust different properties were to the missing data. For example, the Clauset-Newman-Moore and Louvain community detection algorithms showed much more gradual change as edges were added than the label propagation and Girvan-Newman algorithms did, suggesting that the former are more robust. Robustness also varied for some metrics based on interaction type. These results provide a foundation for selecting network properties to use when analyzing messy ecological network data.

Via

Access Paper or Ask Questions

A Guide to Tracking Phylogenies in Parallel and Distributed Agent-based Evolution Models

May 16, 2024

Matthew Andres Moreno, Anika Ranjan, Emily Dolson, Luis Zaman

Abstract:Computer simulations are an important tool for studying the mechanics of biological evolution. In particular, in silico work with agent-based models provides an opportunity to collect high-quality records of ancestry relationships among simulated agents. Such phylogenies can provide insight into evolutionary dynamics within these simulations. Existing work generally tracks lineages directly, yielding an exact phylogenetic record of evolutionary history. However, direct tracking can be inefficient for large-scale, many-processor evolutionary simulations. An alternate approach to extracting phylogenetic information from simulation that scales more favorably is post hoc estimation, akin to how bioinformaticians build phylogenies by assessing genetic similarities between organisms. Recently introduced ``hereditary stratigraphy'' algorithms provide means for efficient inference of phylogenetic history from non-coding annotations on simulated organisms' genomes. A number of options exist in configuring hereditary stratigraphy methodology, but no work has yet tested how they impact reconstruction quality. To address this question, we surveyed reconstruction accuracy under alternate configurations across a matrix of evolutionary conditions varying in selection pressure, spatial structure, and ecological dynamics. We synthesize results from these experiments to suggest a prescriptive system of best practices for work with hereditary stratigraphy, ultimately guiding researchers in choosing appropriate instrumentation for large-scale simulation studies.

Via

Access Paper or Ask Questions

Phylotrack: C++ and Python libraries for in silico phylogenetic tracking

May 15, 2024

Emily Dolson, Santiago Rodriguez-Papa, Matthew Andres Moreno

Abstract:In silico evolution instantiates the processes of heredity, variation, and differential reproductive success (the three "ingredients" for evolution by natural selection) within digital populations of computational agents. Consequently, these populations undergo evolution, and can be used as virtual model systems for studying evolutionary dynamics. This experimental paradigm -- used across biological modeling, artificial life, and evolutionary computation -- complements research done using in vitro and in vivo systems by enabling experiments that would be impossible in the lab or field. One key benefit is complete, exact observability. For example, it is possible to perfectly record all parent-child relationships across simulation history, yielding complete phylogenies (ancestry trees). This information reveals when traits were gained or lost, and also facilitates inference of underlying evolutionary dynamics. The Phylotrack project provides libraries for tracking and analyzing phylogenies in in silico evolution. The project is composed of 1) Phylotracklib: a header-only C++ library, developed under the umbrella of the Empirical project, and 2) Phylotrackpy: a Python wrapper around Phylotracklib, created with Pybind11. Both components supply a public-facing API to attach phylogenetic tracking to digital evolution systems, as well as a stand-alone interface for measuring a variety of popular phylogenetic topology metrics. Underlying design and C++ implementation prioritizes efficiency, allowing for fast generational turnover for agent populations numbering in the tens of thousands. Several explicit features (e.g., phylogeny pruning and abstraction, etc.) are provided for reducing the memory footprint of phylogenetic information.

Via

Access Paper or Ask Questions

Ecology, Spatial Structure, and Selection Pressure Induce Strong Signatures in Phylogenetic Structure

May 12, 2024

Matthew Andres Moreno, Santiago Rodriguez-Papa, Emily Dolson

Abstract:Evolutionary dynamics are shaped by a variety of fundamental, generic drivers, including spatial structure, ecology, and selection pressure. These drivers impact the trajectory of evolution, and have been hypothesized to influence phylogenetic structure. Here, we set out to assess (1) if spatial structure, ecology, and selection pressure leave detectable signatures in phylogenetic structure, (2) the extent, in particular, to which ecology can be detected and discerned in the presence of spatial structure, and (3) the extent to which these phylogenetic signatures generalize across evolutionary systems. To this end, we analyze phylogenies generated by manipulating spatial structure, ecology, and selection pressure within three computational models of varied scope and sophistication. We find that selection pressure, spatial structure, and ecology have characteristic effects on phylogenetic metrics, although these effects are complex and not always intuitive. Signatures have some consistency across systems when using equivalent taxonomic unit definitions (e.g., individual, genotype, species). Further, we find that sufficiently strong ecology can be detected in the presence of spatial structure. We also find that, while low-resolution phylogenetic reconstructions can bias some phylogenetic metrics, high-resolution reconstructions recapitulate them faithfully. Although our results suggest potential for evolutionary inference of spatial structure, ecology, and selection pressure through phylogenetic analysis, further methods development is needed to distinguish these drivers' phylometric signatures from each other and to appropriately normalize phylogenetic metrics. With such work, phylogenetic analysis could provide a versatile toolkit to study large-scale evolving populations.

Via

Access Paper or Ask Questions

Trackable Island-model Genetic Algorithms at Wafer Scale

May 06, 2024

Matthew Andres Moreno, Connor Yang, Emily Dolson, Luis Zaman

Figure 1 for Trackable Island-model Genetic Algorithms at Wafer Scale

Figure 2 for Trackable Island-model Genetic Algorithms at Wafer Scale

Abstract:Emerging ML/AI hardware accelerators, like the 850,000 processor Cerebras Wafer-Scale Engine (WSE), hold great promise to scale up the capabilities of evolutionary computation. However, challenges remain in maintaining visibility into underlying evolutionary processes while efficiently utilizing these platforms' large processor counts. Here, we focus on the problem of extracting phylogenetic information from digital evolution on the WSE platform. We present a tracking-enabled asynchronous island-based genetic algorithm (GA) framework for WSE hardware. Emulated and on-hardware GA benchmarks with a simple tracking-enabled agent model clock upwards of 1 million generations a minute for population sizes reaching 16 million. This pace enables quadrillions of evaluations a day. We validate phylogenetic reconstructions from these trials and demonstrate their suitability for inference of underlying evolutionary conditions. In particular, we demonstrate extraction of clear phylometric signals that differentiate wafer-scale runs with adaptive dynamics enabled versus disabled. Together, these benchmark and validation trials reflect strong potential for highly scalable evolutionary computation that is both efficient and observable. Kernel code implementing the island-model GA supports drop-in customization to support any fixed-length genome content and fitness criteria, allowing it to be leveraged to advance research interests across the community.

* arXiv admin note: substantial text overlap with arXiv:2404.10861

Via

Access Paper or Ask Questions

Trackable Agent-based Evolution Models at Wafer Scale

Apr 16, 2024

Matthew Andres Moreno, Connor Yang, Emily Dolson, Luis Zaman

Figure 1 for Trackable Agent-based Evolution Models at Wafer Scale

Figure 2 for Trackable Agent-based Evolution Models at Wafer Scale

Figure 3 for Trackable Agent-based Evolution Models at Wafer Scale

Figure 4 for Trackable Agent-based Evolution Models at Wafer Scale

Abstract:Continuing improvements in computing hardware are poised to transform capabilities for in silico modeling of cross-scale phenomena underlying major open questions in evolutionary biology and artificial life, such as transitions in individuality, eco-evolutionary dynamics, and rare evolutionary events. Emerging ML/AI-oriented hardware accelerators, like the 850,000 processor Cerebras Wafer Scale Engine (WSE), hold particular promise. However, practical challenges remain in conducting informative evolution experiments that efficiently utilize these platforms' large processor counts. Here, we focus on the problem of extracting phylogenetic information from agent-based evolution on the WSE platform. This goal drove significant refinements to decentralized in silico phylogenetic tracking, reported here. These improvements yield order-of-magnitude performance improvements. We also present an asynchronous island-based genetic algorithm (GA) framework for WSE hardware. Emulated and on-hardware GA benchmarks with a simple tracking-enabled agent model clock upwards of 1 million generations a minute for population sizes reaching 16 million agents. We validate phylogenetic reconstructions from these trials and demonstrate their suitability for inference of underlying evolutionary conditions. In particular, we demonstrate extraction, from wafer-scale simulation, of clear phylometric signals that differentiate runs with adaptive dynamics enabled versus disabled. Together, these benchmark and validation trials reflect strong potential for highly scalable agent-based evolution simulation that is both efficient and observable. Developed capabilities will bring entirely new classes of previously intractable research questions within reach, benefiting further explorations within the evolutionary biology and artificial life communities across a variety of emerging high-performance computing platforms.

Via

Access Paper or Ask Questions

On the Robustness of Lexicase Selection to Contradictory Objectives

Mar 11, 2024

Shakiba Shahbandegan, Emily Dolson

Abstract:Lexicase and epsilon-lexicase selection are state of the art parent selection techniques for problems featuring multiple selection criteria. Originally, lexicase selection was developed for cases where these selection criteria are unlikely to be in conflict with each other, but preliminary work suggests it is also a highly effective many-objective optimization algorithm. However, to predict whether these results generalize, we must understand lexicase selection's performance on contradictory objectives. Prior work has shown mixed results on this question. Here, we develop theory identifying circumstances under which lexicase selection will succeed or fail to find a Pareto-optimal solution. To make this analysis tractable, we restrict our investigation to a theoretical problem with maximally contradictory objectives. Ultimately, we find that lexicase and epsilon-lexicase selection each have a region of parameter space where they are incapable of optimizing contradictory objectives. Outside of this region, however, they perform well despite the presence of contradictory objectives. Based on these findings, we propose theoretically-backed guidelines for parameter choice. Additionally, we identify other properties that may affect whether a many-objective optimization problem is a good fit for lexicase or epsilon-lexicase selection.

Via

Access Paper or Ask Questions

Runtime phylogenetic analysis enables extreme subsampling for test-based problems

Feb 02, 2024

Alexander Lalejini, Marcos Sanson, Jack Garbus, Matthew Andres Moreno, Emily Dolson

Abstract:A phylogeny describes the evolutionary history of an evolving population. Evolutionary search algorithms can perfectly track the ancestry of candidate solutions, illuminating a population's trajectory through the search space. However, phylogenetic analyses are typically limited to post-hoc studies of search performance. We introduce phylogeny-informed subsampling, a new class of subsampling methods that exploit runtime phylogenetic analyses for solving test-based problems. Specifically, we assess two phylogeny-informed subsampling methods -- individualized random subsampling and ancestor-based subsampling -- on three diagnostic problems and ten genetic programming (GP) problems from program synthesis benchmark suites. Overall, we found that phylogeny-informed subsampling methods enable problem-solving success at extreme subsampling levels where other subsampling methods fail. For example, phylogeny-informed subsampling methods more reliably solved program synthesis problems when evaluating just one training case per-individual, per-generation. However, at moderate subsampling levels, phylogeny-informed subsampling generally performed no better than random subsampling on GP problems. Our diagnostic experiments show that phylogeny-informed subsampling improves diversity maintenance relative to random subsampling, but its effects on a selection scheme's capacity to rapidly exploit fitness gradients varied by selection scheme. Continued refinements of phylogeny-informed subsampling techniques offer a promising new direction for scaling up evolutionary systems to handle problems with many expensive-to-evaluate fitness criteria.

* Conference submission

Via

Access Paper or Ask Questions

Reachability Analysis for Lexicase Selection via Community Assembly Graphs

Sep 20, 2023

Emily Dolson, Alexander Lalejini

Abstract:Fitness landscapes have historically been a powerful tool for analyzing the search space explored by evolutionary algorithms. In particular, they facilitate understanding how easily reachable an optimal solution is from a given starting point. However, simple fitness landscapes are inappropriate for analyzing the search space seen by selection schemes like lexicase selection in which the outcome of selection depends heavily on the current contents of the population (i.e. selection schemes with complex ecological dynamics). Here, we propose borrowing a tool from ecology to solve this problem: community assembly graphs. We demonstrate a simple proof-of-concept for this approach on an NK Landscape where we have perfect information. We then demonstrate that this approach can be successfully applied to a complex genetic programming problem. While further research is necessary to understand how to best use this tool, we believe it will be a valuable addition to our toolkit and facilitate analyses that were previously impossible.

Via

Access Paper or Ask Questions

Phylogeny-informed fitness estimation

Jun 06, 2023

Alexander Lalejini, Matthew Andres Moreno, Jose Guadalupe Hernandez, Emily Dolson

Abstract:Phylogenies (ancestry trees) depict the evolutionary history of an evolving population. In evolutionary computing, a phylogeny can reveal how an evolutionary algorithm steers a population through a search space, illuminating the step-by-step process by which any solutions evolve. Thus far, phylogenetic analyses have primarily been applied as post-hoc analyses used to deepen our understanding of existing evolutionary algorithms. Here, we investigate whether phylogenetic analyses can be used at runtime to augment parent selection procedures during an evolutionary search. Specifically, we propose phylogeny-informed fitness estimation, which exploits a population's phylogeny to estimate fitness evaluations. We evaluate phylogeny-informed fitness estimation in the context of the down-sampled lexicase and cohort lexicase selection algorithms on two diagnostic analyses and four genetic programming (GP) problems. Our results indicate that phylogeny-informed fitness estimation can mitigate the drawbacks of down-sampled lexicase, improving diversity maintenance and search space exploration. However, the extent to which phylogeny-informed fitness estimation improves problem-solving success for GP varies by problem, subsampling method, and subsampling level. This work serves as an initial step toward improving evolutionary algorithms by exploiting runtime phylogenetic analysis.

* Submitted as contribution to GPTP XX

Via

Access Paper or Ask Questions