University of Cincinnati
Abstract:Understanding the effect of uncertainty and noise in data on machine learning models (MLM) is crucial in developing trust and measuring performance. In this paper, a new model is proposed to quantify uncertainties and noise in data on MLMs. Using the concept of signal-to-noise ratio (SNR), a new metric called deterministic-non-deterministic ratio (DDR) is proposed to formulate performance of a model. Using synthetic data in experiments, we show how accuracy can change with DDR and how we can use DDR-accuracy curves to determine performance of a model.
Abstract:Detecting anomalies in multivariate time series(MTS) data plays an important role in many domains. The abnormal values could indicate events, medical abnormalities,cyber-attacks, or faulty devices which if left undetected could lead to significant loss of resources, capital, or human lives. In this paper, we propose a novel and innovative approach to anomaly detection called Bayesian State-Space Anomaly Detection(BSSAD). The BSSAD consists of two modules: the neural network module and the Bayesian state-space module. The design of our approach combines the strength of Bayesian state-space algorithms in predicting the next state and the effectiveness of recurrent neural networks and autoencoders in understanding the relationship between the data to achieve high accuracy in detecting anomalies. The modular design of our approach allows flexibility in implementation with the option of changing the parameters of the Bayesian state-space models or swap-ping neural network algorithms to achieve different levels of performance. In particular, we focus on using Bayesian state-space models of particle filters and ensemble Kalman filters. We conducted extensive experiments on five different datasets. The experimental results show the superior performance of our model over baselines, achieving an F1-score greater than 0.95. In addition, we also propose using a metric called MatthewCorrelation Coefficient (MCC) to obtain more comprehensive information about the accuracy of anomaly detection.