Abstract:The large volume and complexity of medical imaging datasets are bottlenecks for storage, transmission, and processing. To tackle these challenges, the application of low-rank matrix approximation (LRMA) and its derivative, local LRMA (LLRMA) has demonstrated potential. This paper conducts a systematic literature review to showcase works applying LRMA and LLRMA in medical imaging. A detailed analysis of the literature identifies LRMA and LLRMA methods applied to various imaging modalities. This paper addresses the challenges and limitations associated with existing LRMA and LLRMA methods. We note a significant shift towards a preference for LLRMA in the medical imaging field since 2015, demonstrating its potential and effectiveness in capturing complex structures in medical data compared to LRMA. Acknowledging the limitations of shallow similarity methods used with LLRMA, we suggest advanced semantic image segmentation for similarity measure, explaining in detail how it can measure similar patches and their feasibility. We note that LRMA and LLRMA are mainly applied to unstructured medical data, and we propose extending their application to different medical data types, including structured and semi-structured. This paper also discusses how LRMA and LLRMA can be applied to regular data with missing entries and the impact of inaccuracies in predicting missing values and their effects. We discuss the impact of patch size and propose the use of random search (RS) to determine the optimal patch size. To enhance feasibility, a hybrid approach using Bayesian optimization and RS is proposed, which could improve the application of LRMA and LLRMA in medical imaging.
Abstract:Anomaly detection is a critical task that involves the identification of data points that deviate from a predefined pattern, useful for fraud detection and related activities. Various techniques are employed for anomaly detection, but recent research indicates that deep learning methods, with their ability to discern intricate data patterns, are well-suited for this task. This study explores the use of Generative Adversarial Networks (GANs) for anomaly detection in power generation plants. The dataset used in this investigation comprises fuel consumption records obtained from power generation plants operated by a telecommunications company. The data was initially collected in response to observed irregularities in the fuel consumption patterns of the generating sets situated at the company's base stations. The dataset was divided into anomalous and normal data points based on specific variables, with 64.88% classified as normal and 35.12% as anomalous. An analysis of feature importance, employing the random forest classifier, revealed that Running Time Per Day exhibited the highest relative importance. A GANs model was trained and fine-tuned both with and without data augmentation, with the goal of increasing the dataset size to enhance performance. The generator model consisted of five dense layers using the tanh activation function, while the discriminator comprised six dense layers, each integrated with a dropout layer to prevent overfitting. Following data augmentation, the model achieved an accuracy rate of 98.99%, compared to 66.45% before augmentation. This demonstrates that the model nearly perfectly classified data points into normal and anomalous categories, with the augmented data significantly enhancing the GANs' performance in anomaly detection. Consequently, this study recommends the use of GANs, particularly when using large datasets, for effective anomaly detection.
Abstract:One of the critical factors that drive the economic development of a country and guarantee the sustainability of its industries is the constant availability of electricity. This is usually provided by the national electric grid. However, in developing countries where companies are emerging on a constant basis including telecommunication industries, those are still experiencing a non-stable electricity supply. Therefore, they have to rely on generators to guarantee their full functionality. Those generators depend on fuel to function and the rate of consumption gets usually high, if not monitored properly. Monitoring operation is usually carried out by a (non-expert) human. In some cases, this could be a tedious process, as some companies have reported an exaggerated high consumption rate. This work proposes a label assisted autoencoder for anomaly detection in the fuel consumed by power generating plants. In addition to the autoencoder model, we added a labelling assistance module that checks if an observation is labelled, the label is used to check the veracity of the corresponding anomaly classification given a threshold. A consensus is then reached on whether training should stop or whether the threshold should be updated or the training should continue with the search for hyper-parameters. Results show that the proposed model is highly efficient for reading anomalies with a detection accuracy of $97.20\%$ which outperforms the existing model of $96.1\%$ accuracy trained on the same dataset. In addition, the proposed model is able to classify the anomalies according to their degree of severity.
Abstract:Deep learning is increasingly gaining rapid adoption in healthcare to help improve patient outcomes. This is more so in medical image analysis which requires extensive training to gain the requisite expertise to become a trusted practitioner. However, while deep learning techniques have continued to provide state-of-the-art predictive performance, one of the primary challenges that stands to hinder this progress in healthcare is the opaque nature of the inference mechanism of these models. So, attribution has a vital role in building confidence in stakeholders for the predictions made by deep learning models to inform clinical decisions. This work seeks to answer the question: what do deep neural network models learn in medical images? In that light, we present a novel attribution framework using adaptive path-based gradient integration techniques. Results show a promising direction of building trust in domain experts to improve healthcare outcomes by allowing them to understand the input-prediction correlative structures, discover new bio-markers, and reveal potential model biases.
Abstract:The instability of power generation from national grids has led industries (e.g., telecommunication) to rely on plant generators to run their businesses. However, these secondary generators create additional challenges such as fuel leakages in and out of the system and perturbations in the fuel level gauges. Consequently, telecommunication operators have been involved in a constant need for fuel to supply diesel generators. With the increase in fuel prices due to socio-economic factors, excessive fuel consumption and fuel pilferage become a problem, and this affects the smooth run of the network companies. In this work, we compared four machine learning algorithms (i.e. Gradient Boosting, Random Forest, Neural Network, and Lasso) to predict the amount of fuel consumed by a power generation plant. After evaluating the predictive accuracy of these models, the Gradient Boosting model out-perform the other three regressor models with the highest Nash efficiency value of 99.1%.
Abstract:Automatic Speech Recognition (ASR) is an active field of research due to its huge number of applications and the proliferation of interfaces or computing devices that can support speech processing. But the bulk of applications is based on well-resourced languages that overshadow under-resourced ones. Yet ASR represents an undeniable mean to promote such languages, especially when design human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possibly future directions in ASR using a limited vocabulary. This work consequently provides a way to go when designing ASR system using limited vocabulary. Although an emphasis is put on limited vocabulary, most of the tools and techniques reported in this survey applied to ASR systems in general.