Abstract:De-identification of medical images is a critical step to ensure privacy during data sharing in research and clinical settings. The initial step in this process involves detecting Protected Health Information (PHI), which can be found in image metadata or imprinted within image pixels. Despite the importance of such systems, there has been limited evaluation of existing AI-based solutions, creating barriers to the development of reliable and robust tools. In this study, we present an AI-based pipeline for PHI detection, comprising three key components: text detection, text extraction, and analysis of PHI content in medical images. By experimenting with exchanging roles of vision and language models within the pipeline, we evaluate the performance and recommend the best setup for the PHI detection task.
Abstract:Image-to-image translation can create large impact in medical imaging, i.e. if images of a patient can be translated to another modality, type or sequence for better diagnosis. However, these methods must be validated by human reader studies, which are costly and restricted to small samples. Automatic evaluation of large samples to pre-evaluate and continuously improve methods before human validation is needed. In this study, we give an overview of reference and non-reference metrics for image synthesis assessment and investigate the ability of nine metrics, that need a reference (SSIM, MS-SSIM, PSNR, MSE, NMSE, MAE, LPIPS, NMI and PCC) and three non-reference metrics (BLUR, MSN, MNG) to detect 11 kinds of distortions in MR images from the BraSyn dataset. In addition we test a downstream segmentation metric and the effect of three normalization methods (Minmax, cMinMax and Zscore). Although PSNR and SSIM are frequently used to evaluate generative models for image-to-image-translation tasks in the medical domain, they show very specific shortcomings. SSIM ignores blurring but is very sensitive to intensity shifts in unnormalized MR images. PSNR is even more sensitive to different normalization methods and hardly measures the degree of distortions. Further metrics, such as LPIPS, NMI and DICE can be very useful to evaluate other similarity aspects. If the images to be compared are misaligned, most metrics are flawed. By carefully selecting and reasonably combining image similarity metrics, the training and selection of generative models for MR image synthesis can be improved. Many aspects of their output can be validated before final and costly evaluation by trained radiologists is conducted.