Abstract:Aerosol effects on climate, weather, and air quality depend on characteristics of individual particles, which are tremendously diverse and change in time. Particle-resolved models are the only models able to capture this diversity in particle physiochemical properties, and these models are computationally expensive. As a strategy for accelerating particle-resolved microphysics models, we introduce Graph-based Learning of Aerosol Dynamics (GLAD) and use this model to train a surrogate of the particle-resolved model PartMC-MOSAIC. GLAD implements a Graph Network-based Simulator (GNS), a machine learning framework that has been used to simulate particle-based fluid dynamics models. In GLAD, each particle is represented as a node in a graph, and the evolution of the particle population over time is simulated through learned message passing. We demonstrate our GNS approach on a simple aerosol system that includes condensation of sulfuric acid onto particles composed of sulfate, black carbon, organic carbon, and water. A graph with particles as nodes is constructed, and a graph neural network (GNN) is then trained using the model output from PartMC-MOSAIC. The trained GNN can then be used for simulating and predicting aerosol dynamics over time. Results demonstrate the framework's ability to accurately learn chemical dynamics and generalize across different scenarios, achieving efficient training and prediction times. We evaluate the performance across three scenarios, highlighting the framework's robustness and adaptability in modeling aerosol microphysics and chemistry.
Abstract:We explore the application of machine learning algorithms to predict the suitability of Russet potato clones for advancement in breeding trials. Leveraging data from manually collected trials in the state of Oregon, we investigate the potential of a wide variety of state-of-the-art binary classification models. We conduct a comprehensive analysis of the dataset that includes preprocessing, feature engineering, and imputation to address missing values. We focus on several key metrics such as accuracy, F1-score, and Matthews correlation coefficient (MCC) for model evaluation. The top-performing models, namely the multi-layer perceptron (MLPC), histogram-based gradient boosting classifier (HGBC), and a support vector machine (SVC), demonstrate consistent and significant results. Variable selection further enhances model performance and identifies influential features in predicting trial outcomes. The findings emphasize the potential of machine learning in streamlining the selection process for potato varieties, offering benefits such as increased efficiency, substantial cost savings, and judicious resource utilization. Our study contributes insights into precision agriculture and showcases the relevance of advanced technologies for informed decision-making in breeding programs.
Abstract:Persistence diagrams are used as signatures of point cloud data assumed to be sampled from manifolds, and represent their topology in a compact fashion. Further, two given clouds of points can be compared by directly comparing their persistence diagrams using the bottleneck distance, d_B. But one potential drawback of this pipeline is that point clouds sampled from topologically similar manifolds can have arbitrarily large d_B values when there is a large degree of scaling between them. This situation is typical in dimension reduction frameworks that are also aiming to preserve topology. We define a new scale-invariant distance between persistence diagrams termed normalized bottleneck distance, d_N, and study its properties. In defining d_N, we also develop a broader framework called metric decomposition for comparing finite metric spaces of equal cardinality with a bijection. We utilize metric decomposition to prove a stability result for d_N by deriving an explicit bound on the distortion of the associated bijective map. We then study two popular dimension reduction techniques, Johnson-Lindenstrauss (JL) projections and metric multidimensional scaling (mMDS), and a third class of general biLipschitz mappings. We provide new bounds on how well these dimension reduction techniques preserve homology with respect to d_N. For a JL map f that transforms input X to f(X), we show that d_N(dgm(X),dgm(f(X)) < e, where dgm(X) is the Vietoris-Rips persistence diagram of X, and 0 < e < 1 is the tolerance up to which pairwise distances are preserved by f. For mMDS, we present new bounds for both d_B and d_N between persistence diagrams of X and its projection in terms of the eigenvalues of the covariance matrix. And for k-biLipschitz maps, we show that d_N is bounded by the product of (k^2-1)/k and the ratio of diameters of X and f(X).
Abstract:We develop a framework that creates a new polygonal mesh representation of the 3D domain of a layer-by-layer 3D printing job on which we identify single, continuous tool paths covering each connected piece of the domain in every layer. We present a tool path algorithm that traverses each such continuous tool path with no crossovers. The key construction at the heart of our framework is a novel Euler transformation that we introduced recently in a separate manuscript. Our Euler transformation converts a 2-dimensional cell complex K into a new 2-complex K^ such that every vertex in the 1-skeleton G^ of K^ has degree 4. Hence G^ is Eulerian, and an Eulerian tour can be followed to print all edges in a continuous fashion without stops. We start with a mesh K of the union of polygons obtained by projecting all layers to the plane. First we compute its Euler transformation K^. In the slicing step, we clip K^ at each layer i using its polygon to obtain K^_i. We then patch K^_i by adding edges such that any odd-degree nodes created by slicing are transformed to have even degrees again. We print extra support edges in place of any segments left out to ensure there are no edges without support in the next layer above. These support edges maintain the Euler nature of K^_i. Finally, we describe a tree-based search algorithm that builds the continuous tool path by traversing "concentric" cycles in the Euler complex. Our algorithm produces a tool path that avoids material collisions and crossovers, and can be printed in a continuous fashion irrespective of complex geometry or topology of the domain (e.g., holes).
Abstract:The contributions of this paper are two-fold. We define a new filtration called the cover filtration built from a single cover based on a generalized Jaccard distance. We provide stability results for the cover filtration and show how the construction is equivalent to the Cech filtration under certain settings. We then develop a language and theory for stable paths within this filtration, inspired by ideas of persistent homology. We demonstrate how the filtration and paths can be applied to a variety of applications in which defining a metric is not obvious but a cover is readily available. We demonstrate the usefulness of this construction by employing it in the context of recommendation systems and explainable machine learning. We demonstrate a new perspective for modeling recommendation system data sets that does not require manufacturing a bespoke metric. This extends work on graph-based recommendation systems, allowing a topological perspective. For an explicit example, we look at a movies data set and we find the stable paths identified in our framework represent a sequence of movies constituting a gentle transition and ordering from one genre to another. For explainable machine learning, we apply the Mapper for model induction, providing explanations in the form of paths between subpopulations or observations. Our framework provides an alternative way of building a filtration from a single mapper that is then used to explore stable paths. As a direct illustration, we build a mapper from a supervised machine learning model trained on the FashionMNIST data set. We show that the stable paths in the cover filtration provide improved explanations of relationships between subpopulations of images.