Abstract:Accurate estimate of long-term risk is critical for safe decision-making, but sampling from rare risk events and long-term trajectories can be prohibitively costly. Risk gradient can be used in many first-order techniques for learning and control methods, but gradient estimate is difficult to obtain using Monte Carlo (MC) methods because the infinitesimal devisor may significantly amplify sampling noise. Motivated by this gap, we propose an efficient method to evaluate long-term risk probabilities and their gradients using short-term samples without sufficient risk events. We first derive that four types of long-term risk probability are solutions of certain partial differential equations (PDEs). Then, we propose a physics-informed learning technique that integrates data and physics information (aforementioned PDEs). The physics information helps propagate information beyond available data and obtain provable generalization beyond available data, which in turn enables long-term risk to be estimated using short-term samples of safe events. Finally, we demonstrate in simulation that the proposed technique has improved sample efficiency, generalizes well to unseen regions, and adapts to changing system parameters.
Abstract:Prostate cancer (PCa) poses a significant threat to men's health, with early diagnosis being crucial for improving prognosis and reducing mortality rates. Transrectal ultrasound (TRUS) plays a vital role in the diagnosis and image-guided intervention of PCa.To facilitate physicians with more accurate and efficient computer-assisted diagnosis and interventions, many image processing algorithms in TRUS have been proposed and achieved state-of-the-art performance in several tasks, including prostate gland segmentation, prostate image registration, PCa classification and detection, and interventional needle detection.The rapid development of these algorithms over the past two decades necessitates a comprehensive summary. In consequence, this survey provides a systematic analysis of this field, outlining the evolution of image processing methods in the context of TRUS image analysis and meanwhile highlighting their relevant contributions. Furthermore, this survey discusses current challenges and suggests future research directions to possibly advance this field further.
Abstract:Most existing deep learning-based registration methods are trained on single-type images to address same-domain tasks.However, cross-domain deformable registration remains challenging.We argue that the tailor-made matching criteria in traditional registration methods is one of the main reason they are applicable in different domains.Motivated by this, we devise a registration-oriented encoder to model the matching criteria of image features and structural features, which is beneficial to boost registration accuracy and adaptability.Specifically, a general feature encoder (Encoder-G) is proposed to capture comprehensive medical image features, while a structural feature encoder (Encoder-S) is designed to encode the structural self-similarity into the global representation.Extensive experiments on images from three different domains prove the efficacy of the proposed method. Moreover, by updating Encoder-S using one-shot learning, our method can effectively adapt to different domains.The code is publicly available at https://github.com/JuliusWang-7/EncoderReg.
Abstract:This paper addresses the design of safety certificates for stochastic systems, with a focus on ensuring long-term safety through fast real-time control. In stochastic environments, set invariance-based methods that restrict the probability of risk events in infinitesimal time intervals may exhibit significant long-term risks due to cumulative uncertainties/risks. On the other hand, reachability-based approaches that account for the long-term future may require prohibitive computation in real-time decision making. To overcome this challenge involving stringent long-term safety vs. computation tradeoffs, we first introduce a novel technique termed `probabilistic invariance'. This technique characterizes the invariance conditions of the probability of interest. When the target probability is defined using long-term trajectories, this technique can be used to design myopic conditions/controllers with assured long-term safe probability. Then, we integrate this technique into safe control and learning. The proposed control methods efficiently assure long-term safety using neural networks or model predictive controllers with short outlook horizons. The proposed learning methods can be used to guarantee long-term safety during and after training. Finally, we demonstrate the performance of the proposed techniques in numerical simulations.
Abstract:The segmentation of organs in volumetric medical images plays an important role in computer-aided diagnosis and treatment/surgery planning. Conventional 2D convolutional neural networks (CNNs) can hardly exploit the spatial correlation of volumetric data. Current 3D CNNs have the advantage to extract more powerful volumetric representations but they usually suffer from occupying excessive memory and computation nevertheless. In this study we aim to enhance the 2D networks with contextual information for better volumetric image segmentation. Accordingly, we propose a contextual embedding learning approach to facilitate 2D CNNs capturing spatial information properly. Our approach leverages the learned embedding and the slice-wisely neighboring matching as a soft cue to guide the network. In such a way, the contextual information can be transferred slice-by-slice thus boosting the volumetric representation of the network. Experiments on challenging prostate MRI dataset (PROMISE12) and abdominal CT dataset (CHAOS) show that our contextual embedding learning can effectively leverage the inter-slice context and improve segmentation performance. The proposed approach is a plug-and-play, and memory-efficient solution to enhance the 2D networks for volumetric segmentation. The code will be publicly available.
Abstract:Deformable image registration plays a crucial role in medical imaging, aiding in disease diagnosis and image-guided interventions. Traditional iterative methods are slow, while deep learning (DL) accelerates solutions but faces usability and precision challenges. This study introduces a pyramid network with the enhanced motion decomposition Transformer (ModeTv2) operator, showcasing superior pairwise optimization (PO) akin to traditional methods. We re-implement ModeT operator with CUDA extensions to enhance its computational efficiency. We further propose RegHead module which refines deformation fields, improves the realism of deformation and reduces parameters. By adopting the PO, the proposed network balances accuracy, efficiency, and generalizability. Extensive experiments on two public brain MRI datasets and one abdominal CT dataset demonstrate the network's suitability for PO, providing a DL model with enhanced usability and interpretability. The code is publicly available.
Abstract:The advent of deep-learning-based registration networks has addressed the time-consuming challenge in traditional iterative methods.However, the potential of current registration networks for comprehensively capturing spatial relationships has not been fully explored, leading to inadequate performance in large-deformation image registration.The pure convolutional neural networks (CNNs) neglect feature enhancement, while current Transformer-based networks are susceptible to information redundancy.To alleviate these issues, we propose a pyramid attention network (PAN) for deformable medical image registration.Specifically, the proposed PAN incorporates a dual-stream pyramid encoder with channel-wise attention to boost the feature representation.Moreover, a multi-head local attention Transformer is introduced as decoder to analyze motion patterns and generate deformation fields.Extensive experiments on two public brain magnetic resonance imaging (MRI) datasets and one abdominal MRI dataset demonstrate that our method achieves favorable registration performance, while outperforming several CNN-based and Transformer-based registration networks.Our code is publicly available at https://github.com/JuliusWang-7/PAN.
Abstract:The quest for accurate prediction of drug molecule properties poses a fundamental challenge in the realm of Artificial Intelligence Drug Discovery (AIDD). An effective representation of drug molecules emerges as a pivotal component in this pursuit. Contemporary leading-edge research predominantly resorts to self-supervised learning (SSL) techniques to extract meaningful structural representations from large-scale, unlabeled molecular data, subsequently fine-tuning these representations for an array of downstream tasks. However, an inherent shortcoming of these studies lies in their singular reliance on one modality of molecular information, such as molecule image or SMILES representations, thus neglecting the potential complementarity of various molecular modalities. In response to this limitation, we propose MolIG, a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures. MolIG model innovatively leverages the coherence and correlation between molecule graph and molecule image to execute self-supervised tasks, effectively amalgamating the strengths of both molecular representation forms. This holistic approach allows for the capture of pivotal molecular structural characteristics and high-level semantic information. Upon completion of pre-training, Graph Neural Network (GNN) Encoder is used for the prediction of downstream tasks. In comparison to advanced baseline models, MolIG exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups such as MoleculeNet Benchmark Group and ADMET Benchmark Group.
Abstract:We focus on the problem of how we can enable a robot to collaborate seamlessly with a human partner, specifically in scenarios like collaborative manufacturing where prexisting data is sparse. Much prior work in human-robot collaboration uses observational models of humans (i.e. models that treat the robot purely as an observer) to choose the robot's behavior, but such models do not account for the influence the robot has on the human's actions, which may lead to inefficient interactions. We instead formulate the problem of optimally choosing a collaborative robot's behavior based on a conditional model of the human that depends on the robot's future behavior. First, we propose a novel model-based formulation of conditional behavior prediction that allows the robot to infer the human's intentions based on its future plan in data-sparse environments. We then show how to utilize a conditional model for proactive goal selection and path generation around human collaborators. Finally, we use our proposed proactive controller in a collaborative task with real users to show that it can improve users' interactions with a robot collaborator quantitatively and qualitatively.
Abstract:Medication recommendation using Electronic Health Records (EHR) is challenging due to complex medical data. Current approaches extract longitudinal information from patient EHR to personalize recommendations. However, existing models often lack sufficient patient representation and overlook the importance of considering the similarity between a patient's medication records and specific medicines. Therefore, an Attention-guided Collaborative Decision Network (ACDNet) for medication recommendation is proposed in this paper. Specifically, ACDNet utilizes attention mechanism and Transformer to effectively capture patient health conditions and medication records by modeling their historical visits at both global and local levels. ACDNet also employs a collaborative decision framework, utilizing the similarity between medication records and medicine representation to facilitate the recommendation process. The experimental results on two extensive medical datasets, MIMIC-III and MIMIC-IV, clearly demonstrate that ACDNet outperforms state-of-the-art models in terms of Jaccard, PR-AUC, and F1 score, reaffirming its superiority. Moreover, the ablation experiments provide solid evidence of the effectiveness of each module in ACDNet, validating their contribution to the overall performance. Furthermore, a detailed case study reinforces the effectiveness of ACDNet in medication recommendation based on EHR data, showcasing its practical value in real-world healthcare scenarios.