Abstract:Ultrasound (US) imaging is highly effective with regards to both cost and versatility in real-time diagnosis; however, determination of fetal gender by US scan in the early stages of pregnancy is also a cause of sex-selective abortion. This work proposes a deep learning object detection approach to accurately mask fetal gender in US images in order to increase the accessibility of the technology. We demonstrate how the YOLOv5L architecture exhibits superior performance relative to other object detection models on this task. Our model achieves 45.8% AP[0.5:0.95], 92% F1-score and 0.006 False Positive Per Image rate on our test set. Furthermore, we introduce a bounding box delay rule based on frame-to-frame structural similarity to reduce the false negative rate by 85%, further improving masking reliability.