University of Cincinnati
Abstract:Understanding the effect of uncertainty and noise in data on machine learning models (MLM) is crucial in developing trust and measuring performance. In this paper, a new model is proposed to quantify uncertainties and noise in data on MLMs. Using the concept of signal-to-noise ratio (SNR), a new metric called deterministic-non-deterministic ratio (DDR) is proposed to formulate performance of a model. Using synthetic data in experiments, we show how accuracy can change with DDR and how we can use DDR-accuracy curves to determine performance of a model.
Abstract:Detecting anomalies in multivariate time series(MTS) data plays an important role in many domains. The abnormal values could indicate events, medical abnormalities,cyber-attacks, or faulty devices which if left undetected could lead to significant loss of resources, capital, or human lives. In this paper, we propose a novel and innovative approach to anomaly detection called Bayesian State-Space Anomaly Detection(BSSAD). The BSSAD consists of two modules: the neural network module and the Bayesian state-space module. The design of our approach combines the strength of Bayesian state-space algorithms in predicting the next state and the effectiveness of recurrent neural networks and autoencoders in understanding the relationship between the data to achieve high accuracy in detecting anomalies. The modular design of our approach allows flexibility in implementation with the option of changing the parameters of the Bayesian state-space models or swap-ping neural network algorithms to achieve different levels of performance. In particular, we focus on using Bayesian state-space models of particle filters and ensemble Kalman filters. We conducted extensive experiments on five different datasets. The experimental results show the superior performance of our model over baselines, achieving an F1-score greater than 0.95. In addition, we also propose using a metric called MatthewCorrelation Coefficient (MCC) to obtain more comprehensive information about the accuracy of anomaly detection.
Abstract:Channel Attention reigns supreme as an effective technique in the field of computer vision. However, the proposed channel attention by SENet suffers from information loss in feature learning caused by the use of Global Average Pooling (GAP) to represent channels as scalars. Thus, designing effective channel attention mechanisms requires finding a solution to enhance features preservation in modeling channel inter-dependencies. In this work, we utilize Wavelet transform compression as a solution to the channel representation problem. We first test wavelet transform as an Auto-Encoder model equipped with conventional channel attention module. Next, we test wavelet transform as a standalone channel compression method. We prove that global average pooling is equivalent to the recursive approximate Haar wavelet transform. With this proof, we generalize channel attention using Wavelet compression and name it WaveNet. Implementation of our method can be embedded within existing channel attention methods with a couple of lines of code. We test our proposed method using ImageNet dataset for image classification task. Our method outperforms the baseline SENet, and achieves the state-of-the-art results. Our code implementation is publicly available at https://github.com/hady1011/WaveNet-C.
Abstract:The GHM multi-level discrete wavelet transform is proposed as preprocessing for image super resolution with convolutional neural networks. Previous works perform analysis with the Haar wavelet only. In this work, 37 single-level wavelets are experimentally analyzed from Haar, Daubechies, Biorthogonal, Reverse Biorthogonal, Coiflets, and Symlets wavelet families. All single-level wavelets report similar results indicating that the convolutional neural network is invariant to choice of wavelet in a single-level filter approach. However, the GHM multi-level wavelet achieves higher quality reconstructions than the single-level wavelets. Three large data sets are used for the experiments: DIV2K, a dataset of textures, and a dataset of satellite images. The approximate high resolution images are compared using seven objective error measurements. A convolutional neural network based approach using wavelet transformed images has good results in the literature.
Abstract:In the modern age of social media and networks, graph representations of real-world phenomena have become an incredibly useful source to mine insights. Often, we are interested in understanding how entities in a graph are interconnected. The Graph Neural Network (GNN) has proven to be a very useful tool in a variety of graph learning tasks including node classification, link prediction, and edge classification. However, in most of these tasks, the graph data we are working with may be noisy and may contain spurious edges. That is, there is a lot of uncertainty associated with the underlying graph structure. Recent approaches to modeling uncertainty have been to use a Bayesian framework and view the graph as a random variable with probabilities associated with model parameters. Introducing the Bayesian paradigm to graph-based models, specifically for semi-supervised node classification, has been shown to yield higher classification accuracies. However, the method of graph inference proposed in recent work does not take into account the structure of the graph. In this paper, we propose a novel algorithm called Bayesian Graph Convolutional Network using Neighborhood Random Walk Sampling (BGCN-NRWS), which uses a Markov Chain Monte Carlo (MCMC) based graph sampling algorithm utilizing graph structure, reduces overfitting by using a variational inference layer, and yields consistently competitive classification results compared to the state-of-the-art in semi-supervised node classification.
Abstract:Question Answering (QA) is a task in natural language processing that has seen considerable growth after the advent of transformers. There has been a surge in QA datasets that have been proposed to challenge natural language processing models to improve human and existing model performance. Many pre-trained language models have proven to be incredibly effective at the task of extractive question answering. However, generalizability remains as a challenge for the majority of these models. That is, some datasets require models to reason more than others. In this paper, we train various pre-trained language models and fine-tune them on multiple question answering datasets of varying levels of difficulty to determine which of the models are capable of generalizing the most comprehensively across different datasets. Further, we propose a new architecture, BERT-BiLSTM, and compare it with other language models to determine if adding more bidirectionality can improve model performance. Using the F1-score as our metric, we find that the RoBERTa and BART pre-trained models perform the best across all datasets and that our BERT-BiLSTM model outperforms the baseline BERT model.