Abstract:In recent biomedical scientific problems, it is a fundamental issue to integratively cluster a set of objects from multiple sources of datasets. Such problems are mostly encountered in genomics, where data is collected from various sources, and typically represent distinct yet complementary information. Integrating these data sources for multi-source clustering is challenging due to their complex dependence structure including directional dependency. Particularly in genomics studies, it is known that there is certain directional dependence between DNA expression, DNA methylation, and RNA expression, widely called The Central Dogma. Most of the existing multi-view clustering methods either assume an independent structure or pair-wise (non-directional) dependency, thereby ignoring the directional relationship. Motivated by this, we propose a copula-based multi-view clustering model where a copula enables the model to accommodate the directional dependence existing in the datasets. We conduct a simulation experiment where the simulated datasets exhibiting inherent directional dependence: it turns out that ignoring the directional dependence negatively affects the clustering performance. As a real application, we applied our model to the breast cancer tumor samples collected from The Cancer Genome Altas (TCGA).
Abstract:Recent introduction of wearable single-lead ECG devices of diverse configurations has caught the intrigue of the medical community. While these devices provide a highly affordable support tool for the caregivers for continuous monitoring and to detect acute conditions, such as arrhythmia, their utility for cardiac diagnostics remains limited. This is because clinical diagnosis of many cardiac pathologies is rooted in gleaning patterns from synchronous 12-lead ECG. If synchronous 12-lead signals of clinical quality can be synthesized from these single-lead devices, it can transform cardiac care by substantially reducing the costs and enhancing access to cardiac diagnostics. However, prior attempts to synthesize synchronous 12-lead ECG have not been successful. Vectorcardiography (VCG) analysis suggests that cardiac axis synthesized from earlier attempts deviates significantly from that estimated from 12-lead and/or Frank lead measurements. This work is perhaps the first successful attempt to synthesize clinically equivalent synchronous 12-lead ECG from single-lead ECG. Our method employs a random forest machine learning model that uses a subject's historical 12-lead recordings to estimate the morphology including the actual timing of various ECG events (relative to the measured single-lead ECG) for all 11 missing leads of the subject. Our method was validated on two benchmark datasets as well as paper ECG and AliveCor-Kardia data obtained from the Heart, Artery, and Vein Center of Fresno, California. Results suggest that this approach can synthesize synchronous ECG with accuracies (R2) exceeding 90%. Accurate synthesis of 12-lead ECG from a single-lead device can ultimately enable its wider application and improved point-of-care (POC) diagnostics.
Abstract:Accuracies of survival models for life expectancy prediction as well as critical-care applications are significantly compromised due to the sparsity of samples and extreme imbalance between the survival (usually, the majority) and mortality class sizes. While a recent random survival forest (RSF) model overcomes the limitations of the proportional hazard assumption, an imbalance in the data results in an underestimation (overestimation) of the hazard of the mortality (survival) classes. A balanced random survival forests (BRSF) model, based on training the RSF model with data generated from a synthetic minority sampling scheme is presented to address this gap. Theoretical results on the effect of balancing on prediction accuracies in BRSF are reported. Benchmarking studies were conducted using five datasets with different levels of class imbalance from public repositories and an imbalanced dataset of 267 acute cardiac patients, collected at the Heart, Artery, and Vein Center of Fresno, CA. Investigations suggest that BRSF provides an improved discriminatory strength between the survival and the mortality classes. It outperformed both optimized Cox (without and with balancing) and RSF with an average reduction of 55\% in the prediction error over the next best alternative.