Abstract:Should we input known genome sequence features or input sequence itself in deep learning framework? As deep learning more popular in various applications, researchers often come to question whether to generate features or use raw sequences for deep learning. To answer this question, we study the prediction accuracy of precursor miRNA prediction of feature-based deep belief network and sequence-based convolution neural network. Tested on a variant of six-layer convolution neural net and three-layer deep belief network, we find the raw sequence input based convolution neural network model performs similar or slightly better than feature based deep belief networks with best accuracy values of 0.995 and 0.990, respectively. Both the models outperform existing benchmarks models. The results shows us that if provided large enough data, well devised raw sequence based deep learning models can replace feature based deep learning models. However, construction of well behaved deep learning model can be very challenging. In cased features can be easily extracted, feature-based deep learning models may be a better alternative.
Abstract:The medical research facilitates to acquire a diverse type of data from the same individual for particular cancer. Recent studies show that utilizing such diverse data results in more accurate predictions. The major challenge faced is how to utilize such diverse data sets in an effective way. In this paper, we introduce a multiple kernel based pipeline for integrative analysis of high-throughput molecular data (somatic mutation, copy number alteration, DNA methylation and mRNA) and clinical data. We apply the pipeline on Ovarian cancer data from TCGA. After multiple kernels have been generated from the weighted sum of individual kernels, it is used to stratify patients and predict clinical outcomes. We examine the survival time, vital status, and neoplasm cancer status of each subtype to verify how well they cluster. We have also examined the power of molecular and clinical data in predicting dichotomized overall survival data and to classify the tumor grade for the cancer samples. It was observed that the integration of various data types yields higher log-rank statistics value. We were also able to predict clinical status with higher accuracy as compared to using individual data types.
Abstract:How do we determine the mutational effects in exome sequencing data with little or no statistical evidence? Can protein structural information fill in the gap of not having enough statistical evidence? In this work, we answer the two questions with the goal towards determining pathogenic effects of rare variants in rare disease. We take the approach of determining the importance of point mutation loci focusing on protein structure features. The proposed structure-based features contain information about geometric, physicochemical, and functional information of mutation loci and those of structural neighbors of the loci. The performance of the structure-based features trained on 80\% of HumDiv and tested on 20\% of HumDiv and on ClinVar datasets showed high levels of discernibility in the mutation's pathogenic or benign effects: F score of 0.71 and 0.68 respectively using multi-layer perceptron. Combining structure- and sequence-based feature further improve the accuracy: F score of 0.86 (HumDiv) and 0.75 (ClinVar). Also, careful examination of the rare variants in rare diseases cases showed that structure-based features are important in discerning importance of variant loci.
Abstract:Business Intelligence and Analytics (BI&A) is the process of extracting and predicting business-critical insights from data. Traditional BI focused on data collection, extraction, and organization to enable efficient query processing for deriving insights from historical data. With the rise of big data and cloud computing, there are many challenges and opportunities for the BI. Especially with the growing number of data sources, traditional BI\&A are evolving to provide intelligence at different scales and perspectives - operational BI, situational BI, self-service BI. In this survey, we review the evolution of business intelligence systems in full scale from back-end architecture to and front-end applications. We focus on the changes in the back-end architecture that deals with the collection and organization of the data. We also review the changes in the front-end applications, where analytic services and visualization are the core components. Using a uses case from BI in Healthcare, which is one of the most complex enterprises, we show how BI\&A will play an important role beyond the traditional usage. The survey provides a holistic view of Business Intelligence and Analytics for anyone interested in getting a complete picture of the different pieces in the emerging next generation BI\&A solutions.
Abstract:MicroRNA (miRNA) are small non-coding RNAs that regulates the gene expression at the post-transcriptional level. Determining whether a sequence segment is miRNA is experimentally challenging. Also, experimental results are sensitive to the experimental environment. These limitations inspire the development of computational methods for predicting the miRNAs. We propose a deep learning based classification model, called DP-miRNA, for predicting precursor miRNA sequence that contains the miRNA sequence. The feature set based Restricted Boltzmann Machine method, which we call DP-miRNA, uses 58 features that are categorized into four groups: sequence features, folding measures, stem-loop features and statistical feature. We evaluate the performance of the DP-miRNA on eleven twelve data sets of varying species, including the human. The deep neural network based classification outperformed support vector machine, neural network, naive Baye's classifiers, k-nearest neighbors, random forests, and a hybrid system combining support vector machine and genetic algorithm.