Abstract:The objective of this study is to predict suicidal and non-suicidal deaths from DNA methylation data using a modern machine learning algorithm. We used support vector machines to classify existing secondary data consisting of normalized values of methylated DNA probe intensities from tissues of two cortical brain regions to distinguish suicide cases from control cases. Before classification, we employed Principal component analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) to reduce the dimension of the data. In comparison to PCA, the modern data visualization method t-SNE performs better in dimensionality reduction. t-SNE accounts for the possible non-linear patterns in low-dimensional data. We applied four-fold cross-validation in which the resulting output from t-SNE was used as training data for the Support Vector Machine (SVM). Despite the use of cross-validation, the nominally perfect prediction of suicidal deaths for BA11 data suggests possible over-fitting of the model. The study also may have suffered from 'spectrum bias' since the individuals were only studied from two extreme scenarios. This research constitutes a baseline study for classifying suicidal and non-suicidal deaths from DNA methylation data. Future studies with larger sample size, while possibly incorporating methylation data from living individuals, may reduce the bias and improve the accuracy of the results.
Abstract:Respiratory infections and chronic respiratory diseases impose a heavy health burden worldwide. Coughing is one of the most common symptoms of many such infections, and can be indicative of flare-ups of chronic respiratory diseases. Whether at a clinical or public health level, the capacity to identify bouts of coughing can aid understanding of population and individual health status. Developing health monitoring models in the context of respiratory diseases and also seasonal diseases with symptoms such as cough has the potential to improve quality of life, help clinicians and public health authorities with their decisions and decrease the cost of health services. In this paper, we investigated the ability to which a simple machine learning approach in the form of Hidden Markov Models (HMMs) could be used to classify different states of coughing using univariate (with a single energy band as the input feature) and multivariate (with a multiple energy band as the input features) binned time series using both of cough data. We further used the model to distinguish cough events from other events and environmental noise. Our Hidden Markov algorithm achieved 92% AUR (Area Under Receiver Operating Characteristic Curve) in classifying coughing events in noisy environments. Moreover, comparison of univariate with multivariate HMMs suggest a high accuracy of multivariate HMMs for cough event classifications.