Abstract:Fast and affordable solutions for COVID-19 testing are necessary to contain the spread of the global pandemic and help relieve the burden on medical facilities. Currently, limited testing locations and expensive equipment pose difficulties for individuals trying to be tested, especially in low-resource settings. Researchers have successfully presented models for detecting COVID-19 infection status using audio samples recorded in clinical settings [5, 15], suggesting that audio-based Artificial Intelligence models can be used to identify COVID-19. Such models have the potential to be deployed on smartphones for fast, widespread, and low-resource testing. However, while previous studies have trained models on cleaned audio samples collected mainly from clinical settings, audio samples collected from average smartphones may yield suboptimal quality data that is different from the clean data that models were trained on. This discrepancy may add a bias that affects COVID-19 status predictions. To tackle this issue, we propose a multi-branch deep learning network that is trained and tested on crowdsourced data where most of the data has not been manually processed and cleaned. Furthermore, the model achieves state-of-art results for the COUGHVID dataset [16]. After breaking down results for each category, we have shown an AUC of 0.99 for audio samples with COVID-19 positive labels.
Abstract:Rapid and affordable methods of testing for COVID-19 infections are essential to reduce infection rates and prevent medical facilities from becoming overwhelmed. Current approaches of detecting COVID-19 require in-person testing with expensive kits that are not always easily accessible. This study demonstrates that crowdsourced cough audio samples recorded and acquired on smartphones from around the world can be used to develop an AI-based method that accurately predicts COVID-19 infection with an ROC-AUC of 77.1% (75.2%-78.3%). Furthermore, we show that our method is able to generalize to crowdsourced audio samples from Latin America and clinical samples from South Asia, without further training using the specific samples from those regions. As more crowdsourced data is collected, further development can be implemented using various respiratory audio samples to create a cough analysis-based machine learning (ML) solution for COVID-19 detection that can likely generalize globally to all demographic groups in both clinical and non-clinical settings.