Abstract:There is no consensus regarding the radiomic feature terminology, the underlying mathematics, or their implementation. This creates a scenario where features extracted using different toolboxes could not be used to build or validate the same model leading to a non-generalization of radiomic results. In this study, the image biomarker standardization initiative (IBSI) established phantom and benchmark values were used to compare the variation of the radiomic features while using 6 publicly available software programs and 1 in-house radiomics pipeline. All IBSI-standardized features (11 classes, 173 in total) were extracted. The relative differences between the extracted feature values from the different software and the IBSI benchmark values were calculated to measure the inter-software agreement. To better understand the variations, features are further grouped into 3 categories according to their properties: 1) morphology, 2) statistic/histogram and 3)texture features. While a good agreement was observed for a majority of radiomics features across the various programs, relatively poor agreement was observed for morphology features. Significant differences were also found in programs that use different gray level discretization approaches. Since these programs do not include all IBSI features, the level of quantitative assessment for each category was analyzed using Venn and the UpSet diagrams and also quantified using two ad hoc metrics. Morphology features earns lowest scores for both metrics, indicating that morphological features are not consistently evaluated among software programs. We conclude that radiomic features calculated using different software programs may not be identical and reliable. Further studies are needed to standardize the workflow of radiomic feature extraction.
Abstract:Our main objective is to develop a novel deep learning-based algorithm for automatic segmentation of prostate zone and to evaluate the proposed algorithm on an additional independent testing data in comparison with inter-reader consistency between two experts. With IRB approval and HIPAA compliance, we designed a novel convolutional neural network (CNN) for automatic segmentation of the prostatic transition zone (TZ) and peripheral zone (PZ) on T2-weighted (T2w) MRI. The total study cohort included 359 patients from two sources; 313 from a deidentified publicly available dataset (SPIE-AAPM-NCI PROSTATEX challenge) and 46 from a large U.S. tertiary referral center with 3T MRI (external testing dataset (ETD)). The TZ and PZ contours were manually annotated by research fellows, supervised by genitourinary (GU) radiologists. The model was developed using 250 patients and tested internally using the remaining 63 patients from the PROSTATEX (internal testing dataset (ITD)) and tested again (n=46) externally using the ETD. The Dice Similarity Coefficient (DSC) was used to evaluate the segmentation performance. DSCs for PZ and TZ were 0.74 and 0.86 in the ITD respectively. In the ETD, DSCs for PZ and TZ were 0.74 and 0.792, respectively. The inter-reader consistency (Expert 2 vs. Expert 1) were 0.71 (PZ) and 0.75 (TZ). This novel DL algorithm enabled automatic segmentation of PZ and TZ with high accuracy on both ITD and ETD without a performance difference for PZ and less than 10% TZ difference. In the ETD, the proposed method can be comparable to experts in the segmentation of prostate zones.