Abstract:Time series forecasting plays a pivotal role in a wide range of applications, including weather prediction, healthcare, structural health monitoring, predictive maintenance, energy systems, and financial markets. While models such as LSTM, GRU, Transformers, and State-Space Models (SSMs) have become standard tools in this domain, selecting the optimal architecture remains a challenge. Performance comparisons often depend on evaluation metrics and the datasets under analysis, making the choice of a universally optimal model controversial. In this work, we introduce a flexible automated framework for time series forecasting that systematically designs and evaluates diverse network architectures by integrating LSTM, GRU, multi-head Attention, and SSM blocks. Using a multi-objective optimization approach, our framework determines the number, sequence, and combination of blocks to align with specific requirements and evaluation objectives. From the resulting Pareto-optimal architectures, the best model for a given context is selected via a user-defined preference function. We validate our framework across four distinct real-world applications. Results show that a single-layer GRU or LSTM is usually optimal when minimizing training time alone. However, when maximizing accuracy or balancing multiple objectives, the best architectures are often composite designs incorporating multiple block types in specific configurations. By employing a weighted preference function, users can resolve trade-offs between objectives, revealing novel, context-specific optimal architectures. Our findings underscore that no single neural architecture is universally optimal for time series forecasting. Instead, the best-performing model emerges as a data-driven composite architecture tailored to user-defined criteria and evaluation objectives.
Abstract:Uncertainty quantification (UQ) plays a pivotal role in scientific machine learning, especially when surrogate models are used to approximate complex systems. Although multilayer perceptions (MLPs) are commonly employed as surrogates, they often suffer from overfitting due to their large number of parameters. Kolmogorov-Arnold networks (KANs) offer an alternative solution with fewer parameters. However, gradient-based inference methods, such as Hamiltonian Monte Carlo (HMC), may result in computational inefficiency when applied to KANs, especially for large-scale datasets, due to the high cost of back-propagation.To address these challenges, we propose a novel approach, combining the dropout Tikhonov ensemble Kalman inversion (DTEKI) with Chebyshev KANs. This gradient-free method effectively mitigates overfitting and enhances numerical stability. Additionally, we incorporate the active subspace method to reduce the parameter-space dimensionality, allowing us to improve the accuracy of predictions and obtain more reliable uncertainty estimates.Extensive experiments demonstrate the efficacy of our approach in various test cases, including scenarios with large datasets and high noise levels. Our results show that the new method achieves comparable or better accuracy, much higher efficiency as well as stability compared to HMC, in addition to scalability. Moreover, by leveraging the low-dimensional parameter subspace, our method preserves prediction accuracy while substantially reducing further the computational cost.
Abstract:We introduce DeepVIVONet, a new framework for optimal dynamic reconstruction and forecasting of the vortex-induced vibrations (VIV) of a marine riser, using field data. We demonstrate the effectiveness of DeepVIVONet in accurately reconstructing the motion of an off--shore marine riser by using sparse spatio-temporal measurements. We also show the generalization of our model in extrapolating to other flow conditions via transfer learning, underscoring its potential to streamline operational efficiency and enhance predictive accuracy. The trained DeepVIVONet serves as a fast and accurate surrogate model for the marine riser, which we use in an outer--loop optimization algorithm to obtain the optimal locations for placing the sensors. Furthermore, we employ an existing sensor placement method based on proper orthogonal decomposition (POD) to compare with our data-driven approach. We find that that while POD offers a good approach for initial sensor placement, DeepVIVONet's adaptive capabilities yield more precise and cost-effective configurations.
Abstract:Inspired by the Kolmogorov-Arnold representation theorem and Kurkova's principle of using approximate representations, we propose the Kurkova-Kolmogorov-Arnold Network (KKAN), a new two-block architecture that combines robust multi-layer perceptron (MLP) based inner functions with flexible linear combinations of basis functions as outer functions. We first prove that KKAN is a universal approximator, and then we demonstrate its versatility across scientific machine-learning applications, including function regression, physics-informed machine learning (PIML), and operator-learning frameworks. The benchmark results show that KKANs outperform MLPs and the original Kolmogorov-Arnold Networks (KANs) in function approximation and operator learning tasks and achieve performance comparable to fully optimized MLPs for PIML. To better understand the behavior of the new representation models, we analyze their geometric complexity and learning dynamics using information bottleneck theory, identifying three universal learning stages, fitting, transition, and diffusion, across all types of architectures. We find a strong correlation between geometric complexity and signal-to-noise ratio (SNR), with optimal generalization achieved during the diffusion stage. Additionally, we propose self-scaled residual-based attention weights to maintain high SNR dynamically, ensuring uniform convergence and prolonged learning.
Abstract:Improving diesel engine efficiency and emission reduction have been critical research topics. Recent government regulations have shifted this focus to another important area related to engine health and performance monitoring. Although the advancements in the use of deep learning methods for system monitoring have shown promising results in this direction, designing efficient methods suitable for field systems remains an open research challenge. The objective of this study is to develop a computationally efficient neural network-based approach for identifying unknown parameters of a mean value diesel engine model to facilitate physics-based health monitoring and maintenance forecasting. We propose a hybrid method combining physics informed neural networks, PINNs, and a deep neural operator, DeepONet to predict unknown parameters and gas flow dynamics in a diesel engine. The operator network predicts independent actuator dynamics learnt through offline training, thereby reducing the PINNs online computational cost. To address PINNs need for retraining with changing input scenarios, we propose two transfer learning (TL) strategies. The first strategy involves multi-stage transfer learning for parameter identification. While this method is computationally efficient as compared to online PINN training, improvements are required to meet field requirements. The second TL strategy focuses solely on training the output weights and biases of a subset of multi-head networks pretrained on a larger dataset, substantially reducing computation time during online prediction. We also evaluate our model for epistemic and aleatoric uncertainty by incorporating dropout in pretrained networks and Gaussian noise in the training dataset. This strategy offers a tailored, computationally inexpensive, and physics-based approach for parameter identification in diesel engine sub systems.
Abstract:Spiking neural networks (SNNs) represent a promising approach in machine learning, combining the hierarchical learning capabilities of deep neural networks with the energy efficiency of spike-based computations. Traditional end-to-end training of SNNs is often based on back-propagation, where weight updates are derived from gradients computed through the chain rule. However, this method encounters challenges due to its limited biological plausibility and inefficiencies on neuromorphic hardware. In this study, we introduce an alternative training approach for SNNs. Instead of using back-propagation, we leverage weight perturbation methods within a forward-mode gradient framework. Specifically, we perturb the weight matrix with a small noise term and estimate gradients by observing the changes in the network output. Experimental results on regression tasks, including solving various PDEs, show that our approach achieves competitive accuracy, suggesting its suitability for neuromorphic systems and potential hardware compatibility.
Abstract:Physics-Informed Neural Networks (PINNs) have emerged as a key tool in Scientific Machine Learning since their introduction in 2017, enabling the efficient solution of ordinary and partial differential equations using sparse measurements. Over the past few years, significant advancements have been made in the training and optimization of PINNs, covering aspects such as network architectures, adaptive refinement, domain decomposition, and the use of adaptive weights and activation functions. A notable recent development is the Physics-Informed Kolmogorov-Arnold Networks (PIKANS), which leverage a representation model originally proposed by Kolmogorov in 1957, offering a promising alternative to traditional PINNs. In this review, we provide a comprehensive overview of the latest advancements in PINNs, focusing on improvements in network design, feature expansion, optimization techniques, uncertainty quantification, and theoretical insights. We also survey key applications across a range of fields, including biomedicine, fluid and solid mechanics, geophysics, dynamical systems, heat transfer, chemical engineering, and beyond. Finally, we review computational frameworks and software tools developed by both academia and industry to support PINN research and applications.
Abstract:The interplay between stochastic processes and optimal control has been extensively explored in the literature. With the recent surge in the use of diffusion models, stochastic processes have increasingly been applied to sample generation. This paper builds on the log transform, known as the Cole-Hopf transform in Brownian motion contexts, and extends it within a more abstract framework that includes a linear operator. Within this framework, we found that the well-known relationship between the Cole-Hopf transform and optimal transport is a particular instance where the linear operator acts as the infinitesimal generator of a stochastic process. We also introduce a novel scenario where the linear operator is the adjoint of the generator, linking to Bayesian inference under specific initial and terminal conditions. Leveraging this theoretical foundation, we develop a new algorithm, named the HJ-sampler, for Bayesian inference for the inverse problem of a stochastic differential equation with given terminal observations. The HJ-sampler involves two stages: (1) solving the viscous Hamilton-Jacobi partial differential equations, and (2) sampling from the associated stochastic optimal control problem. Our proposed algorithm naturally allows for flexibility in selecting the numerical solver for viscous HJ PDEs. We introduce two variants of the solver: the Riccati-HJ-sampler, based on the Riccati method, and the SGM-HJ-sampler, which utilizes diffusion models. We demonstrate the effectiveness and flexibility of the proposed methods by applying them to solve Bayesian inverse problems involving various stochastic processes and prior distributions, including applications that address model misspecifications and quantifying model uncertainty.
Abstract:We integrate neural operators with diffusion models to address the spectral limitations of neural operators in surrogate modeling of turbulent flows. While neural operators offer computational efficiency, they exhibit deficiencies in capturing high-frequency flow dynamics, resulting in overly smooth approximations. To overcome this, we condition diffusion models on neural operators to enhance the resolution of turbulent structures. Our approach is validated for different neural operators on diverse datasets, including a high Reynolds number jet flow simulation and experimental Schlieren velocimetry. The proposed method significantly improves the alignment of predicted energy spectra with true distributions compared to neural operators alone. Additionally, proper orthogonal decomposition analysis demonstrates enhanced spectral fidelity in space-time. This work establishes a new paradigm for combining generative models with neural operators to advance surrogate modeling of turbulent systems, and it can be used in other scientific applications that involve microstructure and high-frequency content. See our project page: vivekoommen.github.io/NO_DM
Abstract:Physics-informed machine learning (PIML) has emerged as a promising alternative to classical methods for predicting dynamical systems, offering faster and more generalizable solutions. However, existing models, including recurrent neural networks (RNNs), transformers, and neural operators, face challenges such as long-time integration, long-range dependencies, chaotic dynamics, and extrapolation, to name a few. To this end, this paper introduces state-space models implemented in Mamba for accurate and efficient dynamical system operator learning. Mamba addresses the limitations of existing architectures by dynamically capturing long-range dependencies and enhancing computational efficiency through reparameterization techniques. To extensively test Mamba and compare against another 11 baselines, we introduce several strict extrapolation testbeds that go beyond the standard interpolation benchmarks. We demonstrate Mamba's superior performance in both interpolation and challenging extrapolation tasks. Mamba consistently ranks among the top models while maintaining the lowest computational cost and exceptional extrapolation capabilities. Moreover, we demonstrate the good performance of Mamba for a real-world application in quantitative systems pharmacology for assessing the efficacy of drugs in tumor growth under limited data scenarios. Taken together, our findings highlight Mamba's potential as a powerful tool for advancing scientific machine learning in dynamical systems modeling. (The code will be available at https://github.com/zheyuanhu01/State_Space_Model_Neural_Operator upon acceptance.)