Abstract:This paper explores the critical role of differentiation approaches for data-driven differential equation discovery. Accurate derivatives of the input data are essential for reliable algorithmic operation, particularly in real-world scenarios where measurement quality is inevitably compromised. We propose alternatives to the commonly used finite differences-based method, notorious for its instability in the presence of noise, which can exacerbate random errors in the data. Our analysis covers four distinct methods: Savitzky-Golay filtering, spectral differentiation, smoothing based on artificial neural networks, and the regularization of derivative variation. We evaluate these methods in terms of applicability to problems, similar to the real ones, and their ability to ensure the convergence of equation discovery algorithms, providing valuable insights for robust modeling of real-world processes.
Abstract:The discovery of equations with knowledge of the process origin is a tempting prospect. However, most equation discovery tools rely on gradient methods, which offer limited control over parameters. An alternative approach is the evolutionary equation discovery, which allows modification of almost every optimization stage. In this paper, we examine the modifications that can be introduced into the evolutionary operators of the equation discovery algorithm, taking inspiration from directed evolution techniques employed in fields such as chemistry and biology. The resulting approach, dubbed directed equation discovery, demonstrates a greater ability to converge towards accurate solutions than the conventional method. To support our findings, we present experiments based on Burgers', wave, and Korteweg--de Vries equations.
Abstract:Differential equation discovery, a machine learning subfield, is used to develop interpretable models, particularly in nature-related applications. By expertly incorporating the general parametric form of the equation of motion and appropriate differential terms, algorithms can autonomously uncover equations from data. This paper explores the prerequisites and tools for independent equation discovery without expert input, eliminating the need for equation form assumptions. We focus on addressing the challenge of assessing the adequacy of discovered equations when the correct equation is unknown, with the aim of providing insights for reliable equation discovery without prior knowledge of the equation form.
Abstract:Evolutionary differential equation discovery proved to be a tool to obtain equations with less a priori assumptions than conventional approaches, such as sparse symbolic regression over the complete possible terms library. The equation discovery field contains two independent directions. The first one is purely mathematical and concerns differentiation, the object of optimization and its relation to the functional spaces and others. The second one is dedicated purely to the optimizational problem statement. Both topics are worth investigating to improve the algorithm's ability to handle experimental data a more artificial intelligence way, without significant pre-processing and a priori knowledge of their nature. In the paper, we consider the prevalence of either single-objective optimization, which considers only the discrepancy between selected terms in the equation, or multi-objective optimization, which additionally takes into account the complexity of the obtained equation. The proposed comparison approach is shown on classical model examples -- Burgers equation, wave equation, and Korteweg - de Vries equation.
Abstract:Most machine learning methods are used as a black box for modelling. We may try to extract some knowledge from physics-based training methods, such as neural ODE (ordinary differential equation). Neural ODE has advantages like a possibly higher class of represented functions, the extended interpretability compared to black-box machine learning models, ability to describe both trend and local behaviour. Such advantages are especially critical for time series with complicated trends. However, the known drawback is the high training time compared to the autoregressive models and long-short term memory (LSTM) networks widely used for data-driven time series modelling. Therefore, we should be able to balance interpretability and training time to apply neural ODE in practice. The paper shows that modern neural ODE cannot be reduced to simpler models for time-series modelling applications. The complexity of neural ODE is compared to or exceeds the conventional time-series modelling tools. The only interpretation that could be extracted is the eigenspace of the operator, which is an ill-posed problem for a large system. Spectra could be extracted using different classical analysis methods that do not have the drawback of extended time. Consequently, we reduce the neural ODE to a simpler linear form and propose a new view on time-series modelling using combined neural networks and an ODE system approach.
Abstract:The numerical methods for differential equation solution allow obtaining a discrete field that converges towards the solution if the method is applied to the correct problem. Nevertheless, the numerical methods have the restricted class of the equations, on which the convergence with a given parameter set or range is proved. Only a few "cheap and dirty" numerical methods converge on a wide class of equations without parameter tuning with the lower approximation order price. The article presents a method that uses an optimization algorithm to obtain a solution using the parameterized approximation. The result may not be as precise as an expert one. However, it allows solving the wide class of equations in an automated manner without the algorithm's parameters change.
Abstract:In modern data science, it is often not enough to obtain only a data-driven model with a good prediction quality. On the contrary, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results. Such questions are unified under machine learning interpretability questions, which could be considered one of the area's raising topics. In the paper, we use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties. It means that whereas one of the apparent objectives is precision, the other could be chosen as the complexity of the model, robustness, and many others. The method application is shown on examples of multi-objective learning of composite models, differential equations, and closed-form algebraic expressions are unified and form approach for model-agnostic learning of the interpretable models.
Abstract:Usually, the systems of partial differential equations (PDEs) are discovered from observational data in the single vector equation form. However, this approach restricts the application to the real cases, where, for example, the form of the external forcing is of interest. In the paper, a multi-objective co-evolution algorithm is described. The single equations within the system and the system itself are evolved simultaneously to obtain the system. This approach allows discovering the systems with the form-independent equations. In contrast to the single vector equation, a component-wise system is more suitable for expert interpretation and, therefore, for applications. The example of the two-dimensional Navier-Stokes equation is considered.
Abstract:The paper describes the usage of intelligent approaches for field development tasks that may assist a decision-making process. We focused on the problem of wells location optimization and two tasks within it: improving the quality of oil production estimation and estimation of reservoir characteristics for appropriate wells allocation and parametrization, using machine learning methods. For oil production estimation, we implemented and investigated the quality of forecasting models: physics-based, pure data-driven, and hybrid one. The CRMIP model was chosen as a physics-based approach. We compare it with the machine learning and hybrid methods in a frame of oil production forecasting task. In the investigation of reservoir characteristics for wells location choice, we automated the seismic analysis using evolutionary identification of convolutional neural network for the reservoir detection. The Volve oil field dataset was used as a case study to conduct the experiments. The implemented approaches can be used to analyze different oil fields or adapted to similar physics-related problems.
Abstract:The modern machine learning methods allow one to obtain the data-driven models in various ways. However, the more complex the model is, the harder it is to interpret. In the paper, we describe the algorithm for the mathematical equations discovery from the given observations data. The algorithm combines genetic programming with the sparse regression. This algorithm allows obtaining different forms of the resulting models. As an example, it could be used for governing analytical equation discovery as well as for partial differential equations (PDE) discovery. The main idea is to collect a bag of the building blocks (it may be simple functions or their derivatives of arbitrary order) and consequently take them from the bag to create combinations, which will represent terms of the final equation. The selected terms pass to the evolutionary algorithm, which is used to evolve the selection. The evolutionary steps are combined with the sparse regression to pick only the significant terms. As a result, we obtain a short and interpretable expression that describes the physical process that lies beyond the data. In the paper, two examples of the algorithm application are described: the PDE discovery for the metocean processes and the function discovery for the acoustics.