Abstract:High-precision modeling of systems is one of the main areas of industrial data analysis. Models of systems, their digital twins, are used to predict their behavior under various conditions. We have developed several models of a storage system using machine learning-based generative models. The system consists of several components: hard disk drive (HDD) and solid-state drive (SSD) storage pools with different RAID schemes and cache. Each storage component is represented by a probabilistic model that describes the probability distribution of the component performance in terms of IOPS and latency, depending on their configuration and external data load parameters. The results of the experiments demonstrate the errors of 4-10 % for IOPS and 3-16 % for latency predictions depending on the components and models of the system. The predictions show up to 0.99 Pearson correlation with Little's law, which can be used for unsupervised reliability checks of the models. In addition, we present novel data sets that can be used for benchmarking regression algorithms, conditional generative models, and uncertainty estimation methods in machine learning.
Abstract:The purpose of change point detection algorithms is to locate an abrupt change in the time evolution of a process. In this paper, we introduce an application of latent neural stochastic differential equations for change point detection problem. We demonstrate the detection capabilities and performance of our model on a range of synthetic and real-world datasets and benchmarks. Most of the studied scenarios show that the proposed algorithm outperforms the state-of-the-art algorithms. We also discuss the strengths and limitations of this approach and indicate directions for further improvements.
Abstract:Accurate particle identification (PID) is one of the most important aspects of the LHCb experiment. Modern machine learning techniques such as neural networks (NNs) are efficiently applied to this problem and are integrated into the LHCb software. In this research, we discuss novel applications of neural network speed-up techniques to achieve faster PID in LHC upgrade conditions. We show that the best results are obtained using variational dropout sparsification, which provides a prediction (feedforward pass) speed increase of up to a factor of sixteen even when compared to a model with shallow networks.
Abstract:Anomaly detection for complex data is a challenging task from the perspective of machine learning. In this work, weconsider cases with missing certain kinds of anomalies in the training dataset, while significant statistics for the normal class isavailable. For such scenarios, conventional supervised methods might suffer from the class imbalance, while unsupervised methodstend to ignore difficult anomalous examples. We extend the idea of the supervised classification approach for class-imbalanceddatasets by exploiting normalizing flows for proper Bayesian inference of the posterior probabilities.
Abstract:Anomaly detection is not an easy problem since distribution of anomalous samples is unknown a priori. We explore a novel method that gives a trade-off possibility between one-class and two-class approaches, and leads to a better performance on anomaly detection problems with small or non-representative anomalous samples. The method is evaluated using several data sets and compared to a set of conventional one-class and two-class approaches.