Abstract:Radiology report generation (RRG) aims to create free-text radiology reports from clinical imaging. Grounded radiology report generation (GRRG) extends RRG by including the localisation of individual findings on the image. Currently, there are no manually annotated chest X-ray (CXR) datasets to train GRRG models. In this work, we present a dataset called PadChest-GR (Grounded-Reporting) derived from PadChest aimed at training GRRG models for CXR images. We curate a public bi-lingual dataset of 4,555 CXR studies with grounded reports (3,099 abnormal and 1,456 normal), each containing complete lists of sentences describing individual present (positive) and absent (negative) findings in English and Spanish. In total, PadChest-GR contains 7,037 positive and 3,422 negative finding sentences. Every positive finding sentence is associated with up to two independent sets of bounding boxes labelled by different readers and has categorical labels for finding type, locations, and progression. To the best of our knowledge, PadChest-GR is the first manually curated dataset designed to train GRRG models for understanding and interpreting radiological images and generated text. By including detailed localization and comprehensive annotations of all clinically relevant findings, it provides a valuable resource for developing and evaluating GRRG models from CXR images. PadChest-GR can be downloaded under request from https://bimcv.cipf.es/bimcv-projects/padchest-gr/
Abstract:In this work we present a method for the detection of radiological findings, their location and differential diagnoses from chest x-rays. Unlike prior works that focus on the detection of few pathologies, we use a hierarchical taxonomy mapped to the Unified Medical Language System (UMLS) terminology to identify 189 radiological findings, 22 differential diagnosis and 122 anatomic locations, including ground glass opacities, infiltrates, consolidations and other radiological findings compatible with COVID-19. We train the system on one large database of 92,594 frontal chest x-rays (AP or PA, standing, supine or decubitus) and a second database of 2,065 frontal images of COVID-19 patients identified by at least one positive Polymerase Chain Reaction (PCR) test. The reference labels are obtained through natural language processing of the radiological reports. On 23,159 test images, the proposed neural network obtains an AUC of 0.94 for the diagnosis of COVID-19. To our knowledge, this work uses the largest chest x-ray dataset of COVID-19 positive cases to date and is the first one to use a hierarchical labeling schema and to provide interpretability of the results, not only by using network attention methods, but also by indicating the radiological findings that have led to the diagnosis.