Abstract:Background and Objective: Biometric measurements of fetal head are important indicators for maternal and fetal health monitoring during pregnancy. 3D ultrasound (US) has unique advantages over 2D scan in covering the whole fetal head and may promote the diagnoses. However, automatically segmenting the whole fetal head in US volumes still pends as an emerging and unsolved problem. The challenges that automated solutions need to tackle include the poor image quality, boundary ambiguity, long-span occlusion, and the appearance variability across different fetal poses and gestational ages. In this paper, we propose the first fully-automated solution to segment the whole fetal head in US volumes. Methods: The segmentation task is firstly formulated as an end-to-end volumetric mapping under an encoder-decoder deep architecture. We then combine the segmentor with a proposed hybrid attention scheme (HAS) to select discriminative features and suppress the non-informative volumetric features in a composite and hierarchical way. With little computation overhead, HAS proves to be effective in addressing boundary ambiguity and deficiency. To enhance the spatial consistency in segmentation, we further organize multiple segmentors in a cascaded fashion to refine the results by revisiting context in the prediction of predecessors. Results: Validated on a large dataset collected from 100 healthy volunteers, our method presents superior segmentation performance (DSC (Dice Similarity Coefficient), 96.05%), remarkable agreements with experts. With another 156 volumes collected from 52 volunteers, we ahieve high reproducibilities (mean standard deviation 11.524 mL) against scan variations. Conclusion: This is the first investigation about whole fetal head segmentation in 3D US. Our method is promising to be a feasible solution in assisting the volumetric US-based prenatal studies.
Abstract:The 3D ultrasound (US) entrance inspires a multitude of automated prenatal examinations. However, studies about the structuralized description of the whole fetus in 3D US are still rare. In this paper, we propose to estimate the 3D pose of fetus in US volumes to facilitate its quantitative analyses in global and local scales. Given the great challenges in 3D US, including the high volume dimension, poor image quality, symmetric ambiguity in anatomical structures and large variations of fetal pose, our contribution is three-fold. (i) This is the first work about 3D pose estimation of fetus in the literature. We aim to extract the skeleton of whole fetus and assign different segments/joints with correct torso/limb labels. (ii) We propose a self-supervised learning (SSL) framework to finetune the deep network to form visually plausible pose predictions. Specifically, we leverage the landmark-based registration to effectively encode case-adaptive anatomical priors and generate evolving label proxy for supervision. (iii) To enable our 3D network perceive better contextual cues with higher resolution input under limited computing resource, we further adopt the gradient check-pointing (GCP) strategy to save GPU memory and improve the prediction. Extensively validated on a large 3D US dataset, our method tackles varying fetal poses and achieves promising results. 3D pose estimation of fetus has potentials in serving as a map to provide navigation for many advanced studies.
Abstract:Volumetric ultrasound has great potentials in promoting prenatal examinations. Automated solutions are highly desired to efficiently and effectively analyze the massive volumes. Segmentation and landmark localization are two key techniques in making the quantitative evaluation of prenatal ultrasound volumes available in clinic. However, both tasks are non-trivial when considering the poor image quality, boundary ambiguity and anatomical variations in volumetric ultrasound. In this paper, we propose an effective framework for simultaneous segmentation and landmark localization in prenatal ultrasound volumes. The proposed framework has two branches where informative cues of segmentation and landmark localization can be propagated bidirectionally to benefit both tasks. As landmark localization tends to suffer from false positives, we propose a distance based loss to suppress the noise and thus enhance the localization map and in turn the segmentation. Finally, we further leverage an adversarial module to emphasize the correspondence between segmentation and landmark localization. Extensively validated on a volumetric ultrasound dataset of fetal femur, our proposed framework proves to be a promising solution to facilitate the interpretation of prenatal ultrasound volumes.
Abstract:Deep neural networks are state-of-the-art models for understanding the content of images, video and raw input data. However, implementing a deep neural network in embedded systems is a challenging task, because a typical deep neural network, such as a Deep Belief Network using 128x128 images as input, could exhaust Giga bytes of memory and result in bandwidth and computing bottleneck. To address this challenge, this paper presents a hardware-oriented deep learning algorithm, named as the Deep Adaptive Network, which attempts to exploit the sparsity in the neural connections. The proposed method adaptively reduces the weights associated with negligible features to zero, leading to sparse feedforward network architecture. Furthermore, since the small proportion of important weights are significantly larger than zero, they can be robustly thresholded and represented using single-bit integers (-1 and +1), leading to implementations of deep neural networks with sparse and binary connections. Our experiments showed that, for the application of recognizing MNIST handwritten digits, the features extracted by a two-layer Deep Adaptive Network with about 25% reserved important connections achieved 97.2% classification accuracy, which was almost the same with the standard Deep Belief Network (97.3%). Furthermore, for efficient hardware implementations, the sparse-and-binary-weighted deep neural network could save about 99.3% memory and 99.9% computation units without significant loss of classification accuracy for pattern recognition applications.