Abstract:Deep learning-based medical image segmentation has seen tremendous progress over the last decade, but there is still relatively little transfer into clinical practice. One of the main barriers is the challenge of domain generalisation, which requires segmentation models to maintain high performance across a wide distribution of image data. This challenge is amplified by the many factors that contribute to the diverse appearance of medical images, such as acquisition conditions and patient characteristics. The impact of shifting patient characteristics such as age and sex on segmentation performance remains relatively under-studied, especially for abdominal organs, despite that this is crucial for ensuring the fairness of the segmentation model. We perform the first study to determine the impact of population shift with respect to age and sex on abdominal CT image segmentation, by leveraging two large public datasets, and introduce a novel metric to quantify the impact. We find that population shift is a challenge similar in magnitude to cross-dataset shift for abdominal organ segmentation, and that the effect is asymmetric and dataset-dependent. We conclude that dataset diversity in terms of known patient characteristics is not necessarily equivalent to dataset diversity in terms of image features. This implies that simple population matching to ensure good generalisation and fairness may be insufficient, and we recommend that fairness research should be directed towards better understanding and quantifying medical image dataset diversity in terms of performance-relevant characteristics such as organ morphology.