Abstract:Human breath contains hundreds of volatile molecules that can provide powerful, non-intrusive spectral diagnosis of a diverse set of diseases and physiological/metabolic states. To unleash this tremendous potential for medical science, we present a robust analytical method that simultaneously measures tens of thousands of spectral features in each breath sample, followed by efficient and detail-specific multivariate data analysis for unambiguous binary medical response classification. We combine mid-infrared cavity-enhanced direct frequency comb spectroscopy (CE-DFCS), capable of real-time collection of tens of thousands of distinct molecular features at parts-per-trillion sensitivity, with supervised machine learning, capable of analysis and verification of extremely high-dimensional input data channels. Here, we present the first application of this method to the breath detection of Coronavirus Disease 2019 (COVID-19). Using 170 individual samples at the University of Colorado, we report a cross-validated area under the Receiver-Operating-Characteristics curve of 0.849(4), providing excellent prediction performance. Further, this method detected a significant difference between male and female breath as well as other variables such as smoking and abdominal pain. Together, these highlight the utility of CE-DFCS for rapid, non-invasive detection of diverse biological conditions and disease states. The unique properties of frequency comb spectroscopy thus help establish precise digital spectral fingerprints for building accurate databases and provide means for simultaneous multi-response classifications. The predictive power can be further enhanced with readily scalable comb spectral coverage.