Abstract:The electrocardiogram or ECG has been in use for over 100 years and remains the most widely performed diagnostic test to characterize cardiac structure and electrical activity. We hypothesized that parallel advances in computing power, innovations in machine learning algorithms, and availability of large-scale digitized ECG data would enable extending the utility of the ECG beyond its current limitations, while at the same time preserving interpretability, which is fundamental to medical decision-making. We identified 36,186 ECGs from the UCSF database that were 1) in normal sinus rhythm and 2) would enable training of specific models for estimation of cardiac structure or function or detection of disease. We derived a novel model for ECG segmentation using convolutional neural networks (CNN) and Hidden Markov Models (HMM) and evaluated its output by comparing electrical interval estimates to 141,864 measurements from the clinical workflow. We built a 725-element patient-level ECG profile using downsampled segmentation data and trained machine learning models to estimate left ventricular mass, left atrial volume, mitral annulus e' and to detect and track four diseases: pulmonary arterial hypertension (PAH), hypertrophic cardiomyopathy (HCM), cardiac amyloid (CA), and mitral valve prolapse (MVP). CNN-HMM derived ECG segmentation agreed with clinical estimates, with median absolute deviations (MAD) as a fraction of observed value of 0.6% for heart rate and 4% for QT interval. Patient-level ECG profiles enabled quantitative estimates of left ventricular and mitral annulus e' velocity with good discrimination in binary classification models of left ventricular hypertrophy and diastolic function. Models for disease detection ranged from AUROC of 0.94 to 0.77 for MVP. Top-ranked variables for all models included known ECG characteristics along with novel predictors of these traits/diseases.