Abstract:In modern data science, it is often not enough to obtain only a data-driven model with a good prediction quality. On the contrary, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results. Such questions are unified under machine learning interpretability questions, which could be considered one of the area's raising topics. In the paper, we use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties. It means that whereas one of the apparent objectives is precision, the other could be chosen as the complexity of the model, robustness, and many others. The method application is shown on examples of multi-objective learning of composite models, differential equations, and closed-form algebraic expressions are unified and form approach for model-agnostic learning of the interpretable models.
Abstract:The effectiveness of the machine learning methods for real-world tasks depends on the proper structure of the modeling pipeline. The proposed approach is aimed to automate the design of composite machine learning pipelines, which is equivalent to computation workflows that consist of models and data operations. The approach combines key ideas of both automated machine learning and workflow management systems. It designs the pipelines with a customizable graph-based structure, analyzes the obtained results, and reproduces them. The evolutionary approach is used for the flexible identification of pipeline structure. The additional algorithms for sensitivity analysis, atomization, and hyperparameter tuning are implemented to improve the effectiveness of the approach. Also, the software implementation on this approach is presented as an open-source framework. The set of experiments is conducted for the different datasets and tasks (classification, regression, time series forecasting). The obtained results confirm the correctness and effectiveness of the proposed approach in the comparison with the state-of-the-art competitors and baseline solutions.
Abstract:Time series analysis is widely used in various fields of science and industry. However, the vast majority of the time series obtained from real sources contain a large number of gaps, have a complex character, and can contain incorrect or missed parts. So, it is useful to have a convenient, efficient, and flexible instrument to fill the gaps in the time series. In this paper, we propose an approach for filling the gaps by the evolutionary automatic machine learning, that is implemented as a part of the FEDOT framework. Automated identification of the optimal data-driven model structure allows the adopting of the gap filling strategy to the specific problem. As a case study, the multivariate sea surface height dataset is used. During the experimental studies, the proposed approach was compared with other gap-filling methods and the composite models allow obtaining the higher quality of the gap restoration.