Abstract:Analysis of compressible turbulent flows is essential for applications related to propulsion, energy generation, and the environment. Here, we present BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples from 34 high-fidelity direct numerical simulations, which addresses the current limited availability of 3D high-fidelity reacting and non-reacting compressible turbulent flow simulation data. With this data, we benchmark a total of 49 variations of five deep learning approaches for 3D super-resolution - which can be applied for improving scientific imaging, simulations, turbulence models, as well as in computer vision applications. We perform neural scaling analysis on these models to examine the performance of different machine learning (ML) approaches, including two scientific ML techniques. We demonstrate that (i) predictive performance can scale with model size and cost, (ii) architecture matters significantly, especially for smaller models, and (iii) the benefits of physics-based losses can persist with increasing model size. The outcomes of this benchmark study are anticipated to offer insights that can aid the design of 3D super-resolution models, especially for turbulence models, while this data is expected to foster ML methods for a broad range of flow physics applications. This data is publicly available with download links and browsing tools consolidated at https://blastnet.github.io.
Abstract:The increasing incidence and severity of wildfires underscores the necessity of accurately predicting their behavior. While high-fidelity models derived from first principles offer physical accuracy, they are too computationally expensive for use in real-time fire response. Low-fidelity models sacrifice some physical accuracy and generalizability via the integration of empirical measurements, but enable real-time simulations for operational use in fire response. Machine learning techniques offer the ability to bridge these objectives by learning first-principles physics while achieving computational speedup. While deep learning approaches have demonstrated the ability to predict wildfire propagation over large time periods, time-resolved fire-spread predictions are needed for active fire management. In this work, we evaluate the ability of deep learning approaches in accurately modeling the time-resolved dynamics of wildfires. We use an autoregressive process in which a convolutional recurrent deep learning model makes predictions that propagate a wildfire over 15 minute increments. We demonstrate the model in application to three simulated datasets of increasing complexity, containing both field fires with homogeneous fuel distribution as well as real-world topologies sampled from the California region of the United States. We show that even after 100 autoregressive predictions representing more than 24 hours of simulated fire spread, the resulting models generate stable and realistic propagation dynamics, achieving a Jaccard score between 0.89 and 0.94 when predicting the resulting fire scar.
Abstract:In general, large datasets enable deep learning models to perform with good accuracy and generalizability. However, massive high-fidelity simulation datasets (from molecular chemistry, astrophysics, computational fluid dynamics (CFD), etc. can be challenging to curate due to dimensionality and storage constraints. Lossy compression algorithms can help mitigate limitations from storage, as long as the overall data fidelity is preserved. To illustrate this point, we demonstrate that deep learning models, trained and tested on data from a petascale CFD simulation, are robust to errors introduced during lossy compression in a semantic segmentation problem. Our results demonstrate that lossy compression algorithms offer a realistic pathway for exposing high-fidelity scientific data to open-source data repositories for building community datasets. In this paper, we outline, construct, and evaluate the requirements for establishing a big data framework, demonstrated at https://blastnet.github.io/, for scientific machine learning.
Abstract:Predicting wildfire spread is critical for land management and disaster preparedness. To this end, we present `Next Day Wildfire Spread,' a curated, large-scale, multivariate data set of historical wildfires aggregating nearly a decade of remote-sensing data across the United States. In contrast to existing fire data sets based on Earth observation satellites, our data set combines 2D fire data with multiple explanatory variables (e.g., topography, vegetation, weather, drought index, population density) aligned over 2D regions, providing a feature-rich data set for machine learning. To demonstrate the usefulness of this data set, we implement a convolutional autoencoder that takes advantage of the spatial information of this data to predict wildfire spread. We compare the performance of the neural network with other machine learning models: logistic regression and random forest. This data set can be used as a benchmark for developing wildfire propagation models based on remote sensing data for a lead time of one day.
Abstract:Many practical combustion systems such as those in rockets, gas turbines, and internal combustion engines operate under high pressures that surpass the thermodynamic critical limit of fuel-oxidizer mixtures. These conditions require the consideration of complex fluid behaviors that pose challenges for numerical simulations, casting doubts on the validity of existing subgrid-scale (SGS) models in large-eddy simulations of these systems. While data-driven methods have shown high accuracy as closure models in simulations of turbulent flames, these models are often criticized for lack of physical interpretability, wherein they provide answers but no insight into their underlying rationale. The objective of this study is to assess SGS stress models from conventional physics-driven approaches and an interpretable machine learning algorithm, i.e., the random forest regressor, in a turbulent transcritical non-premixed flame. To this end, direct numerical simulations (DNS) of transcritical liquid-oxygen/gaseous-methane (LOX/GCH4) inert and reacting flows are performed. Using this data, a priori analysis is performed on the Favre-filtered DNS data to examine the accuracy of physics-based and random forest SGS-models under these conditions. SGS stresses calculated with the gradient model show good agreement with the exact terms extracted from filtered DNS. The accuracy of the random-forest regressor decreased when physics-based constraints are applied to the feature set. Results demonstrate that random forests can perform as effectively as algebraic models when modeling subgrid stresses, only when trained on a sufficiently representative database. The employment of random forest feature importance score is shown to provide insight into discovering subgrid-scale stresses through sparse regression.
Abstract:As the climate changes, the severity of wildland fires is expected to worsen. Understanding, controlling and mitigating these fires requires building models to accurately capture the fire-propagation dynamics. Supervised machine learning techniques provide a potential approach for developing such models. The objective of this study is to evaluate the feasibility of using the Convolutional Long Short-Term Memory (ConvLSTM) recurrent neural network (RNN) to model the dynamics of wildland fire propagation. The model is trained on simulated wildfire data generated by a cellular automaton percolation model. Four simulated datasets are analyzed, each with increasing degrees of complexity. The simplest dataset includes a constant wind direction as a single confounding factor, whereas the most complex dataset includes dynamic wind, complex terrain, spatially varying moisture content and realistic vegetation density distributions. We examine how effectively the ConvLSTM can capture the fire-spread dynamics over consecutive time steps using classification and regression metrics. It is shown that these ConvLSTMs are capable of capturing local fire transmission events, as well as the overall fire dynamics, such as the rate at which the fire spreads. Finally, we demonstrate that ConvLSTMs outperform non-temporal Convolutional Neural Networks(CNNs), particularly on the most difficult dataset.
Abstract:Identifying regions that have high likelihood for wildfires is a key component of land and forestry management and disaster preparedness. We create a data set by aggregating nearly a decade of remote-sensing data and historical fire records to predict wildfires. This prediction problem is framed as three machine learning tasks. Results are compared and analyzed for four different deep learning models to estimate wildfire likelihood. The results demonstrate that deep learning models can successfully identify areas of high fire likelihood using aggregated data about vegetation, weather, and topography with an AUC of 83%.
Abstract:In this investigation, we outline a data-assisted approach that employs random forest classifiers for local and dynamic combustion submodel assignment in turbulent-combustion simulations. This method is applied in simulations of a single-element GOX/GCH4 rocket combustor; a priori as well as a posteriori assessments are conducted to (i) evaluate the accuracy and adjustability of the classifier for targeting different quantities-of-interest (QoIs), and (ii) assess improvements, resulting from the data-assisted combustion model assignment, in predicting target QoIs during simulation runtime. Results from the a priori study show that random forests, trained with local flow properties as input variables and combustion model errors as training labels, assign three different combustion models - finite-rate chemistry (FRC), flamelet progress variable (FPV) model, and inert mixing (IM) - with reasonable classification performance even when targeting multiple QoIs. Applications in a posteriori studies demonstrate improved predictions from data-assisted simulations, in temperature and CO mass fraction, when compared with monolithic FPV calculations. These results demonstrate that this data-driven framework holds promise for the dynamic combustion submodel assignment in reacting flow simulations.