Abstract:Image features need to be robust against differences in positioning, acquisition and segmentation to ensure reproducibility. Radiomic models that only include robust features can be used to analyse new images, whereas models with non-robust features may fail to predict the outcome of interest accurately. Test-retest imaging is recommended to assess robustness, but may not be available for the phenotype of interest. We therefore investigated 18 methods to determine feature robustness based on image perturbations. Test-retest and perturbation robustness were compared for 4032 features that were computed from the gross tumour volume in two cohorts with computed tomography imaging: I) 31 non-small-cell lung cancer (NSCLC) patients; II): 19 head-and-neck squamous cell carcinoma (HNSCC) patients. Robustness was measured using the intraclass correlation coefficient (1,1) (ICC). Features with ICC$\geq0.90$ were considered robust. The NSCLC cohort contained more robust features for test-retest imaging than the HNSCC cohort ($73.5\%$ vs. $34.0\%$). A perturbation chain consisting of noise addition, affine translation, volume growth/shrinkage and supervoxel-based contour randomisation identified the fewest false positive robust features (NSCLC: $3.3\%$; HNSCC: $10.0\%$). Thus, this perturbation chain may be used to assess feature robustness.