*: shared first/last authors
Abstract:Purpose: Deformable image registration (DIR) is critical in adaptive radiation therapy (ART) to account for anatomical changes. Conventional intensity-based DIR methods often fail when image intensities differ. This study evaluates a hybrid similarity metric combining intensity and structural information, leveraging CycleGAN-based intensity correction and auto-segmentation across three DIR workflows. Methods: A hybrid similarity metric combining a point-to-distance (PD) score and intensity similarity was implemented. Synthetic CT (sCT) images were generated using a 2D CycleGAN model trained on unpaired CT and CBCT images to enhance soft-tissue contrast. DIR workflows compared included: (1) traditional intensity-based (No PD), (2) auto-segmented contours on sCT (CycleGAN PD), and (3) expert manual contours (Expert PD). A 3D U-Net model trained on 56 images and validated on 14 cases segmented the prostate, bladder, and rectum. DIR accuracy was assessed using Dice Similarity Coefficient (DSC), 95% Hausdorff Distance (HD), and fiducial separation. Results: The hybrid metric improved DIR accuracy. For the prostate, DSC increased from 0.61+/-0.18 (No PD) to 0.82+/-0.13 (CycleGAN PD) and 0.89+/-0.05 (Expert PD), with reductions in 95% HD from 11.75 mm to 4.86 mm and 3.27 mm, respectively. Fiducial separation decreased from 8.95 mm to 4.07 mm (CycleGAN PD) and 4.11 mm (Expert PD) (p < 0.05). Improvements were also observed for the bladder and rectum. Conclusion: This study demonstrates that a hybrid similarity metric using CycleGAN-based auto-segmentation improves DIR accuracy, particularly for low-contrast CBCT images. These findings highlight the potential for integrating AI-based image correction and segmentation into ART workflows to enhance precision and streamline clinical processes.
Abstract:International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice guidelines and define open research questions to be addressed in the future.