Abstract:Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated significant achievements in various image and video generation tasks, including the domain of medical imaging. However, generating echocardiography videos based on semantic anatomical information remains an unexplored area of research. This is mostly due to the constraints imposed by the currently available datasets, which lack sufficient scale and comprehensive frame-wise annotations for every cardiac cycle. This paper aims to tackle the aforementioned challenges by expanding upon existing video diffusion models for the purpose of cardiac video synthesis. More specifically, our focus lies in generating video using semantic maps of the initial frame during the cardiac cycle, commonly referred to as end diastole. To further improve the synthesis process, we integrate spatial adaptive normalization into multiscale feature maps. This enables the inclusion of semantic guidance during synthesis, resulting in enhanced realism and coherence of the resultant video sequences. Experiments are conducted on the CAMUS dataset, which is a highly used dataset in the field of echocardiography. Our model exhibits better performance compared to the standard diffusion technique in terms of multiple metrics, including FID, FVD, and SSMI.
Abstract:Convolutional neural networks (CNNs) have recently proven their excellent ability to segment 2D cardiac ultrasound images. However, the majority of attempts to perform full-sequence segmentation of cardiac ultrasound videos either rely on models trained only on keyframe images or fail to maintain the topology over time. To address these issues, in this work, we consider segmentation of ultrasound video as a registration estimation problem and present a novel method for diffeomorphic image registration using neural ordinary differential equations (Neural ODE). In particular, we consider the registration field vector field between frames as a continuous trajectory ODE. The estimated registration field is then applied to the segmentation mask of the first frame to obtain a segment for the whole cardiac cycle. The proposed method, Echo-ODE, introduces several key improvements compared to the previous state-of-the-art. Firstly, by solving a continuous ODE, the proposed method achieves smoother segmentation, preserving the topology of segmentation maps over the whole sequence (Hausdorff distance: 3.7-4.4). Secondly, it maintains temporal consistency between frames without explicitly optimizing for temporal consistency attributes, achieving temporal consistency in 91% of the videos in the dataset. Lastly, the proposed method is able to maintain the clinical accuracy of the segmentation maps (MAE of the LVEF: 2.7-3.1). The results show that our method surpasses the previous state-of-the-art in multiple aspects, demonstrating the importance of spatial-temporal data processing for the implementation of Neural ODEs in medical imaging applications. These findings open up new research directions for solving echocardiography segmentation tasks.