Abstract:Testis size is known to be one of the main predictors of male fertility, usually assessed in clinical workup via palpation or imaging. Despite its potential, population-level evaluation of testicular volume using imaging remains underexplored. Previous studies, limited by small and biased datasets, have demonstrated the feasibility of machine learning for testis volume segmentation. This paper presents an evaluation of segmentation methods for testicular volume using Magnet Resonance Imaging data from the UKBiobank. The best model achieves a median dice score of $0.87$, compared to median dice score of $0.83$ for human interrater reliability on the same dataset, enabling large-scale annotation on a population scale for the first time. Our overall aim is to provide a trained model, comparative baseline methods, and annotated training data to enhance accessibility and reproducibility in testis MRI segmentation research.