Abstract:Recent advances in deep learning have inspired numerous works on data-driven solutions to partial differential equation (PDE) problems. These neural PDE solvers can often be much faster than their numerical counterparts; however, each presents its unique limitations and generally balances training cost, numerical accuracy, and ease of applicability to different problem setups. To address these limitations, we introduce several methods to apply latent diffusion models to physics simulation. Firstly, we introduce a mesh autoencoder to compress arbitrarily discretized PDE data, allowing for efficient diffusion training across various physics. Furthermore, we investigate full spatio-temporal solution generation to mitigate autoregressive error accumulation. Lastly, we investigate conditioning on initial physical quantities, as well as conditioning solely on a text prompt to introduce text2PDE generation. We show that language can be a compact, interpretable, and accurate modality for generating physics simulations, paving the way for more usable and accessible PDE solvers. Through experiments on both uniform and structured grids, we show that the proposed approach is competitive with current neural PDE solvers in both accuracy and efficiency, with promising scaling behavior up to $\sim$3 billion parameters. By introducing a scalable, accurate, and usable physics simulator, we hope to bring neural PDE solvers closer to practical use.
Abstract:Pretraining for partial differential equation (PDE) modeling has recently shown promise in scaling neural operators across datasets to improve generalizability and performance. Despite these advances, our understanding of how pretraining affects neural operators is still limited; studies generally propose tailored architectures and datasets that make it challenging to compare or examine different pretraining frameworks. To address this, we compare various pretraining methods without optimizing architecture choices to characterize pretraining dynamics on different models and datasets as well as to understand its scaling and generalization behavior. We find that pretraining is highly dependent on model and dataset choices, but in general transfer learning or physics-based pretraining strategies work best. In addition, pretraining performance can be further improved by using data augmentations. Lastly, pretraining is additionally beneficial when fine-tuning in scarce data regimes or when generalizing to downstream data similar to the pretraining distribution. Through providing insights into pretraining neural operators for physics prediction, we hope to motivate future work in developing and evaluating pretraining methods for PDEs.
Abstract:Accurate weather forecasting is crucial in various sectors, impacting decision-making processes and societal events. Data-driven approaches based on machine learning models have recently emerged as a promising alternative to numerical weather prediction models given their potential to capture physics of different scales from historical data and the significantly lower computational cost during the prediction stage. Renowned for its state-of-the-art performance across diverse domains, the Transformer model has also gained popularity in machine learning weather prediction. Yet applying Transformer architectures to weather forecasting, particularly on a global scale is computationally challenging due to the quadratic complexity of attention and the quadratic increase in spatial points as resolution increases. In this work, we propose a factorized-attention-based model tailored for spherical geometries to mitigate this issue. More specifically, it utilizes multi-dimensional factorized kernels that convolve over different axes where the computational complexity of the kernel is only quadratic to the axial resolution instead of overall resolution. The deterministic forecasting accuracy of the proposed model on $1.5^\circ$ and 0-7 days' lead time is on par with state-of-the-art purely data-driven machine learning weather prediction models. We also showcase the proposed model holds great potential to push forward the Pareto front of accuracy-efficiency for Transformer weather models, where it can achieve better accuracy with less computational cost compared to Transformer based models with standard attention.
Abstract:Neural solvers for partial differential equations (PDEs) have great potential, yet their practicality is currently limited by their generalizability. PDEs evolve over broad scales and exhibit diverse behaviors; predicting these phenomena will require learning representations across a wide variety of inputs, which may encompass different coefficients, geometries, or equations. As a step towards generalizable PDE modeling, we adapt masked pretraining for PDEs. Through self-supervised learning across PDEs, masked autoencoders can learn useful latent representations for downstream tasks. In particular, masked pretraining can improve coefficient regression and timestepping performance of neural solvers on unseen equations. We hope that masked pretraining can emerge as a unifying method across large, unlabeled, and heterogeneous datasets to learn latent physics at scale.
Abstract:The growth of deep learning in the past decade has motivated important applications to smart manufacturing and machine health monitoring. In particular, vibration data offers a rich and reliable source to provide meaningful insights into machine health and predictive maintenance. In this work, we present a Transformer based framework for analyzing vibration signals to predict different types of bearing faults (FaultFormer). In particular, we process signal data using data augmentations and extract their Fourier modes to train a transformer encoder to achieve state of the art accuracies. The attention mechanism as well as model outputs were analyzed to confirm the transformer's ability to automatically extract features within signals and learn both global and local relationships to make classifications. Lastly, two pretraining strategies were proposed to pave the way for large, generalizable transformers that could adapt to new data, situations, or machinery on the production floor.