Abstract:After more than two years since the beginning of the COVID-19 pandemic, the pressure of this crisis continues to devastate globally. The use of chest X-ray (CXR) imaging as a complementary screening strategy to RT-PCR testing is not only prevailing but has greatly increased due to its routine clinical use for respiratory complaints. Thus far, many visual perception models have been proposed for COVID-19 screening based on CXR imaging. Nevertheless, the accuracy and the generalization capacity of these models are very much dependent on the diversity and the size of the dataset they were trained on. Motivated by this, we introduce COVIDx CXR-3, a large-scale benchmark dataset of CXR images for supporting COVID-19 computer vision research. COVIDx CXR-3 is composed of 30,386 CXR images from a multinational cohort of 17,026 patients from at least 51 countries, making it, to the best of our knowledge, the most extensive, most diverse COVID-19 CXR dataset in open access form. Here, we provide comprehensive details on the various aspects of the proposed dataset including patient demographics, imaging views, and infection types. The hope is that COVIDx CXR-3 can assist scientists in advancing computer vision research against the COVID-19 pandemic.
Abstract:Computed tomography (CT) has been widely explored as a COVID-19 screening and assessment tool to complement RT-PCR testing. To assist radiologists with CT-based COVID-19 screening, a number of computer-aided systems have been proposed; however, many proposed systems are built using CT data which is limited in both quantity and diversity. Motivated to support efforts in the development of machine learning-driven screening systems, we introduce COVIDx CT-3, a large-scale multinational benchmark dataset for detection of COVID-19 cases from chest CT images. COVIDx CT-3 includes 431,205 CT slices from 6,068 patients across at least 17 countries, which to the best of our knowledge represents the largest, most diverse dataset of COVID-19 CT images in open-access form. Additionally, we examine the data diversity and potential biases of the COVIDx CT-3 dataset, finding that significant geographic and class imbalances remain despite efforts to curate data from a wide variety of sources.