Abstract:Automated machine learning (AutoML) systems propose an end-to-end solution to a given machine learning problem, creating either fixed or flexible pipelines. Fixed pipelines are task independent constructs: their general composition remains the same, regardless of the data. In contrast, the structure of flexible pipelines varies depending on the input, making them finely tailored to individual tasks. However, flexible pipelines can be structurally overcomplicated and have poor explainability. We propose the EVOSA approach that compensates for the negative points of flexible pipelines by incorporating a sensitivity analysis which increases the robustness and interpretability of the flexible solutions. EVOSA quantitatively estimates positive and negative impact of an edge or a node on a pipeline graph, and feeds this information to the evolutionary AutoML optimizer. The correctness and efficiency of EVOSA was validated in tabular, multimodal and computer vision tasks, suggesting generalizability of the proposed approach across domains.
Abstract:The modeling and forecasting of sea ice conditions in the Arctic region are important tasks for ship routing, offshore oil production, and environmental monitoring. We propose the adaptive surrogate modeling approach named LANE-SI (Lightweight Automated Neural Ensembling for Sea Ice) that uses ensemble of relatively simple deep learning models with different loss functions for forecasting of spatial distribution for sea ice concentration in the specified water area. Experimental studies confirm the quality of a long-term forecast based on a deep learning model fitted to the specific water area is comparable to resource-intensive physical modeling, and for some periods of the year, it is superior. We achieved a 20% improvement against the state-of-the-art physics-based forecast system SEAS5 for the Kara Sea.
Abstract:Resource-intensive computations are a major factor that limits the effectiveness of automated machine learning solutions. In the paper, we propose a modular approach that can be used to increase the quality of evolutionary optimization for modelling pipelines with a graph-based structure. It consists of several stages - parallelization, caching and evaluation. Heterogeneous and remote resources can be involved in the evaluation stage. The conducted experiments confirm the correctness and effectiveness of the proposed approach. The implemented algorithms are available as a part of the open-source framework FEDOT.
Abstract:In recent years generative design techniques have become firmly established in numerous applied fields, especially in engineering. These methods are demonstrating intensive growth owing to promising outlook. However, existing approaches are limited by the specificity of problem under consideration. In addition, they do not provide desired flexibility. In this paper we formulate general approach to an arbitrary generative design problem and propose novel framework called GEFEST (Generative Evolution For Encoded STructure) on its basis. The developed approach is based on three general principles: sampling, estimation and optimization. This ensures the freedom of method adjustment for solution of particular generative design problem and therefore enables to construct the most suitable one. A series of experimental studies was conducted to confirm the effectiveness of the GEFEST framework. It involved synthetic and real-world cases (coastal engineering, microfluidics, thermodynamics and oil field planning). Flexible structure of the GEFEST makes it possible to obtain the results that surpassing baseline solutions.
Abstract:In the paper, a multi-objective evolutionary surrogate-assisted approach for the fast and effective generative design of coastal breakwaters is proposed. To approximate the computationally expensive objective functions, the deep convolutional neural network is used as a surrogate model. This model allows optimizing a configuration of breakwaters with a different number of structures and segments. In addition to the surrogate, an assistant model was developed to estimate the confidence of predictions. The proposed approach was tested on the synthetic water area, the SWAN model was used to calculate the wave heights. The experimental results confirm that the proposed approach allows obtaining more effective (less expensive with better protective properties) solutions than non-surrogate approaches for the same time.
Abstract:In modern data science, it is often not enough to obtain only a data-driven model with a good prediction quality. On the contrary, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results. Such questions are unified under machine learning interpretability questions, which could be considered one of the area's raising topics. In the paper, we use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties. It means that whereas one of the apparent objectives is precision, the other could be chosen as the complexity of the model, robustness, and many others. The method application is shown on examples of multi-objective learning of composite models, differential equations, and closed-form algebraic expressions are unified and form approach for model-agnostic learning of the interpretable models.
Abstract:The effectiveness of the machine learning methods for real-world tasks depends on the proper structure of the modeling pipeline. The proposed approach is aimed to automate the design of composite machine learning pipelines, which is equivalent to computation workflows that consist of models and data operations. The approach combines key ideas of both automated machine learning and workflow management systems. It designs the pipelines with a customizable graph-based structure, analyzes the obtained results, and reproduces them. The evolutionary approach is used for the flexible identification of pipeline structure. The additional algorithms for sensitivity analysis, atomization, and hyperparameter tuning are implemented to improve the effectiveness of the approach. Also, the software implementation on this approach is presented as an open-source framework. The set of experiments is conducted for the different datasets and tasks (classification, regression, time series forecasting). The obtained results confirm the correctness and effectiveness of the proposed approach in the comparison with the state-of-the-art competitors and baseline solutions.
Abstract:The paper describes the usage of intelligent approaches for field development tasks that may assist a decision-making process. We focused on the problem of wells location optimization and two tasks within it: improving the quality of oil production estimation and estimation of reservoir characteristics for appropriate wells allocation and parametrization, using machine learning methods. For oil production estimation, we implemented and investigated the quality of forecasting models: physics-based, pure data-driven, and hybrid one. The CRMIP model was chosen as a physics-based approach. We compare it with the machine learning and hybrid methods in a frame of oil production forecasting task. In the investigation of reservoir characteristics for wells location choice, we automated the seismic analysis using evolutionary identification of convolutional neural network for the reservoir detection. The Volve oil field dataset was used as a case study to conduct the experiments. The implemented approaches can be used to analyze different oil fields or adapted to similar physics-related problems.
Abstract:In this paper, a multipurpose Bayesian-based method for data analysis, causal inference and prediction in the sphere of oil and gas reservoir development is considered. This allows analysing parameters of a reservoir, discovery dependencies among parameters (including cause and effects relations), checking for anomalies, prediction of expected values of missing parameters, looking for the closest analogues, and much more. The method is based on extended algorithm MixLearn@BN for structural learning of Bayesian networks. Key ideas of MixLearn@BN are following: (1) learning the network structure on homogeneous data subsets, (2) assigning a part of the structure by an expert, and (3) learning the distribution parameters on mixed data (discrete and continuous). Homogeneous data subsets are identified as various groups of reservoirs with similar features (analogues), where similarity measure may be based on several types of distances. The aim of the described technique of Bayesian network learning is to improve the quality of predictions and causal inference on such networks. Experimental studies prove that the suggested method gives a significant advantage in missing values prediction and anomalies detection accuracy. Moreover, the method was applied to the database of more than a thousand petroleum reservoirs across the globe and allowed to discover novel insights in geological parameters relationships.
Abstract:In this paper, a multi-objective approach for the design of composite data-driven mathematical models is proposed. It allows automating the identification of graph-based heterogeneous pipelines that consist of different blocks: machine learning models, data preprocessing blocks, etc. The implemented approach is based on a parameter-free genetic algorithm (GA) for model design called GPComp@Free. It is developed to be part of automated machine learning solutions and to increase the efficiency of the modeling pipeline automation. A set of experiments was conducted to verify the correctness and efficiency of the proposed approach and substantiate the selected solutions. The experimental results confirm that a multi-objective approach to the model design allows achieving better diversity and quality of obtained models. The implemented approach is available as a part of the open-source AutoML framework FEDOT.