Engineering Mathematics, University of Bristol, affiliated with the Bristol Robotics Lab, United Kingdom
Abstract:3D pose estimation from a 2D cross-sectional view enables healthcare professionals to navigate through the 3D space, and such techniques initiate automatic guidance in many image-guided radiology applications. In this work, we investigate how estimating 3D fetal pose from freehand 2D ultrasound scanning can guide a sonographer to locate a head standard plane. Fetal head pose is estimated by the proposed Pose-GuideNet, a novel 2D/3D registration approach to align freehand 2D ultrasound to a 3D anatomical atlas without the acquisition of 3D ultrasound. To facilitate the 2D to 3D cross-dimensional projection, we exploit the prior knowledge in the atlas to align the standard plane frame in a freehand scan. A semantic-aware contrastive-based approach is further proposed to align the frames that are off standard planes based on their anatomical similarity. In the experiment, we enhance the existing assessment of freehand image localization by comparing the transformation of its estimated pose towards standard plane with the corresponding probe motion, which reflects the actual view change in 3D anatomy. Extensive results on two clinical head biometry tasks show that Pose-GuideNet not only accurately predicts pose but also successfully predicts the direction of the fetal head. Evaluations with probe motions further demonstrate the feasibility of adopting Pose-GuideNet for freehand ultrasound-assisted navigation in a sensor-free environment.
Abstract:We present the first automated multimodal summary generation system, MMSummary, for medical imaging video, particularly with a focus on fetal ultrasound analysis. Imitating the examination process performed by a human sonographer, MMSummary is designed as a three-stage pipeline, progressing from keyframe detection to keyframe captioning and finally anatomy segmentation and measurement. In the keyframe detection stage, an innovative automated workflow is proposed to progressively select a concise set of keyframes, preserving sufficient video information without redundancy. Subsequently, we adapt a large language model to generate meaningful captions for fetal ultrasound keyframes in the keyframe captioning stage. If a keyframe is captioned as fetal biometry, the segmentation and measurement stage estimates biometric parameters by segmenting the region of interest according to the textual prior. The MMSummary system provides comprehensive summaries for fetal ultrasound examinations and based on reported experiments is estimated to reduce scanning time by approximately 31.5%, thereby suggesting the potential to enhance clinical workflow efficiency.
Abstract:Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They intentionally corrupt an input image, reconstruct it to follow the learned 'normal' distribution, and subsequently segment anomalies based on reconstruction error. Corrupting an input image, however, inevitably leads to suboptimal reconstruction even of normal regions, causing false positives. To alleviate this, we propose a novel iterative spatial mask-refining strategy IterMask2. We iteratively mask areas of the image, reconstruct them, and update the mask based on reconstruction error. This iterative process progressively adds information about areas that are confidently normal as per the model. The increasing content guides reconstruction of nearby masked areas, improving reconstruction of normal tissue under these areas, reducing false positives. We also use high-frequency image content as an auxiliary input to provide additional structural information for masked areas. This further improves reconstruction error of normal in comparison to anomalous areas, facilitating segmentation of the latter. We conduct experiments on several brain lesion datasets and demonstrate effectiveness of our method. Code is available at: https://github.com/ZiyunLiang/IterMasks2
Abstract:Annotation ambiguity due to inherent data uncertainties such as blurred boundaries in medical scans and different observer expertise and preferences has become a major obstacle for training deep-learning based medical image segmentation models. To address it, the common practice is to gather multiple annotations from different experts, leading to the setting of multi-rater medical image segmentation. Existing works aim to either merge different annotations into the "groundtruth" that is often unattainable in numerous medical contexts, or generate diverse results, or produce personalized results corresponding to individual expert raters. Here, we bring up a more ambitious goal for multi-rater medical image segmentation, i.e., obtaining both diversified and personalized results. Specifically, we propose a two-stage framework named D-Persona (first Diversification and then Personalization). In Stage I, we exploit multiple given annotations to train a Probabilistic U-Net model, with a bound-constrained loss to improve the prediction diversity. In this way, a common latent space is constructed in Stage I, where different latent codes denote diversified expert opinions. Then, in Stage II, we design multiple attention-based projection heads to adaptively query the corresponding expert prompts from the shared latent space, and then perform the personalized medical image segmentation. We evaluated the proposed model on our in-house Nasopharyngeal Carcinoma dataset and the public lung nodule dataset (i.e., LIDC-IDRI). Extensive experiments demonstrated our D-Persona can provide diversified and personalized results at the same time, achieving new SOTA performance for multi-rater medical image segmentation. Our code will be released at https://github.com/ycwu1997/D-Persona.
Abstract:Microsurgery involves the dexterous manipulation of delicate tissue or fragile structures such as small blood vessels, nerves, etc., under a microscope. To address the limitation of imprecise manipulation of human hands, robotic systems have been developed to assist surgeons in performing complex microsurgical tasks with greater precision and safety. However, the steep learning curve for robot-assisted microsurgery (RAMS) and the shortage of well-trained surgeons pose significant challenges to the widespread adoption of RAMS. Therefore, the development of a versatile training system for RAMS is necessary, which can bring tangible benefits to both surgeons and patients. In this paper, we present a Tactile Internet-Based Micromanipulation System (TIMS) based on a ROS-Django web-based architecture for microsurgical training. This system can provide tactile feedback to operators via a wearable tactile display (WTD), while real-time data is transmitted through the internet via a ROS-Django framework. In addition, TIMS integrates haptic guidance to `guide' the trainees to follow a desired trajectory provided by expert surgeons. Learning from demonstration based on Gaussian Process Regression (GPR) was used to generate the desired trajectory. User studies were also conducted to verify the effectiveness of our proposed TIMS, comparing users' performance with and without tactile feedback and/or haptic guidance.
Abstract:Noisy labels collected with limited annotation cost prevent medical image segmentation algorithms from learning precise semantic correlations. Previous segmentation arts of learning with noisy labels merely perform a pixel-wise manner to preserve semantics, such as pixel-wise label correction, but neglect the pair-wise manner. In fact, we observe that the pair-wise manner capturing affinity relations between pixels can greatly reduce the label noise rate. Motivated by this observation, we present a novel perspective for noisy mitigation by incorporating both pixel-wise and pair-wise manners, where supervisions are derived from noisy class and affinity labels, respectively. Unifying the pixel-wise and pair-wise manners, we propose a robust Joint Class-Affinity Segmentation (JCAS) framework to combat label noise issues in medical image segmentation. Considering the affinity in pair-wise manner incorporates contextual dependencies, a differentiated affinity reasoning (DAR) module is devised to rectify the pixel-wise segmentation prediction by reasoning about intra-class and inter-class affinity relations. To further enhance the noise resistance, a class-affinity loss correction (CALC) strategy is designed to correct supervision signals via the modeled noise label distributions in class and affinity labels. Meanwhile, CALC strategy interacts the pixel-wise and pair-wise manners through the theoretically derived consistency regularization. Extensive experiments under both synthetic and real-world noisy labels corroborate the efficacy of the proposed JCAS framework with a minimum gap towards the upper bound performance. The source code is available at \url{https://github.com/CityU-AIM-Group/JCAS}.
Abstract:This paper studies a practical domain adaptive (DA) semantic segmentation problem where only pseudo-labeled target data is accessible through a black-box model. Due to the domain gap and label shift between two domains, pseudo-labeled target data contains mixed closed-set and open-set label noises. In this paper, we propose a simplex noise transition matrix (SimT) to model the mixed noise distributions in DA semantic segmentation and formulate the problem as estimation of SimT. By exploiting computational geometry analysis and properties of segmentation, we design three complementary regularizers, i.e. volume regularization, anchor guidance, convex guarantee, to approximate the true SimT. Specifically, volume regularization minimizes the volume of simplex formed by rows of the non-square SimT, which ensures outputs of segmentation model to fit into the ground truth label distribution. To compensate for the lack of open-set knowledge, anchor guidance and convex guarantee are devised to facilitate the modeling of open-set noise distribution and enhance the discriminative feature learning among closed-set and open-set classes. The estimated SimT is further utilized to correct noise issues in pseudo labels and promote the generalization ability of segmentation model on target domain data. Extensive experimental results demonstrate that the proposed SimT can be flexibly plugged into existing DA methods to boost the performance. The source code is available at https://github.com/CityU-AIM-Group/SimT.
Abstract:Unsupervised domain adaptation (UDA) aims to transfer the knowledge from the labeled source domain to the unlabeled target domain. Existing self-training based UDA approaches assign pseudo labels for target data and treat them as ground truth labels to fully leverage unlabeled target data for model adaptation. However, the generated pseudo labels from the model optimized on the source domain inevitably contain noise due to the domain gap. To tackle this issue, we advance a MetaCorrection framework, where a Domain-aware Meta-learning strategy is devised to benefit Loss Correction (DMLC) for UDA semantic segmentation. In particular, we model the noise distribution of pseudo labels in target domain by introducing a noise transition matrix (NTM) and construct meta data set with domain-invariant source data to guide the estimation of NTM. Through the risk minimization on the meta data set, the optimized NTM thus can correct the noisy issues in pseudo labels and enhance the generalization ability of the model on the target data. Considering the capacity gap between shallow and deep features, we further employ the proposed DMLC strategy to provide matched and compatible supervision signals for different level features, thereby ensuring deep adaptation. Extensive experimental results highlight the effectiveness of our method against existing state-of-the-art methods on three benchmarks.
Abstract:Automatic melanoma segmentation in dermoscopic images is essential in computer-aided diagnosis of skin cancer. Existing methods may suffer from the hole and shrink problems with limited segmentation performance. To tackle these issues, we propose a novel complementary network with adaptive receptive filed learning. Instead of regarding the segmentation task independently, we introduce a foreground network to detect melanoma lesions and a background network to mask non-melanoma regions. Moreover, we propose adaptive atrous convolution (AAC) and knowledge aggregation module (KAM) to fill holes and alleviate the shrink problems. AAC explicitly controls the receptive field at multiple scales and KAM convolves shallow feature maps by dilated convolutions with adaptive receptive fields, which are adjusted according to deep feature maps. In addition, a novel mutual loss is proposed to utilize the dependency between the foreground and background networks, thereby enabling the reciprocally influence within these two networks. Consequently, this mutual training strategy enables the semi-supervised learning and improve the boundary-sensitivity. Training with Skin Imaging Collaboration (ISIC) 2018 skin lesion segmentation dataset, our method achieves a dice co-efficient of 86.4% and shows better performance compared with state-of-the-art melanoma segmentation methods.
Abstract:Automatically segmenting sub-regions of gliomas (necrosis, edema and enhancing tumor) and accurately predicting overall survival (OS) time from multimodal MRI sequences have important clinical significance in diagnosis, prognosis and treatment of gliomas. However, due to the high degree variations of heterogeneous appearance and individual physical state, the segmentation of sub-regions and OS prediction are very challenging. To deal with these challenges, we utilize a 3D dilated multi-fiber network (DMFNet) with weighted dice loss for brain tumor segmentation, which incorporates prior volume statistic knowledge and obtains a balance between small and large objects in MRI scans. For OS prediction, we propose a DenseNet based 3D neural network with position encoding convolutional layer (PECL) to extract meaningful features from T1 contrast MRI, T2 MRI and previously segmented subregions. Both labeled data and unlabeled data are utilized to prevent over-fitting for semi-supervised learning. Those learned deep features along with handcrafted features (such as ages, volume of tumor) and position encoding segmentation features are fed to a Gradient Boosting Decision Tree (GBDT) to predict a specific OS day