Abstract:We present a novel approach to reconstruction of 3D cardiac motion from sparse intraoperative data. While existing methods can accurately reconstruct 3D organ geometries from full 3D volumetric imaging, they cannot be used during surgical interventions where usually limited observed data, such as a few 2D frames or 1D signals, is available in real-time. We propose a versatile framework for reconstructing 3D motion from such partial data. It discretizes the 3D space into a deformable tetrahedral grid with signed distance values, providing implicit unlimited resolution while maintaining explicit control over motion dynamics. Given an initial 3D model reconstructed from pre-operative full volumetric data, our system, equipped with an universal observation encoder, can reconstruct coherent 3D cardiac motion from full 3D volumes, a few 2D MRI slices or even 1D signals. Extensive experiments on cardiac intervention scenarios demonstrate our ability to generate plausible and anatomically consistent 3D motion reconstructions from various sparse real-time observations, highlighting its potential for multimodal cardiac imaging. Our code and model will be made available at https://github.com/Scalsol/MedTet.
Abstract:Fundus fluorescein angiography (FFA) is crucial for diagnosing and monitoring retinal vascular issues but is limited by its invasive nature and restricted accessibility compared to color fundus (CF) imaging. Existing methods that convert CF images to FFA are confined to static image generation, missing the dynamic lesional changes. We introduce Fundus2Video, an autoregressive generative adversarial network (GAN) model that generates dynamic FFA videos from single CF images. Fundus2Video excels in video generation, achieving an FVD of 1497.12 and a PSNR of 11.77. Clinical experts have validated the fidelity of the generated videos. Additionally, the model's generator demonstrates remarkable downstream transferability across ten external public datasets, including blood vessel segmentation, retinal disease diagnosis, systemic disease prediction, and multimodal retrieval, showcasing impressive zero-shot and few-shot capabilities. These findings position Fundus2Video as a powerful, non-invasive alternative to FFA exams and a versatile retinal generative foundation model that captures both static and temporal retinal features, enabling the representation of complex inter-modality relationships.
Abstract:Early detection of eye diseases like glaucoma, macular degeneration, and diabetic retinopathy is crucial for preventing vision loss. While artificial intelligence (AI) foundation models hold significant promise for addressing these challenges, existing ophthalmic foundation models primarily focus on a single modality, whereas diagnosing eye diseases requires multiple modalities. A critical yet often overlooked aspect is harnessing the multi-view information across various modalities for the same patient. Additionally, due to the long-tail nature of ophthalmic diseases, standard fully supervised or unsupervised learning approaches often struggle. Therefore, it is essential to integrate clinical text to capture a broader spectrum of diseases. We propose EyeCLIP, a visual-language foundation model developed using over 2.77 million multi-modal ophthalmology images with partial text data. To fully leverage the large multi-modal unlabeled and labeled data, we introduced a pretraining strategy that combines self-supervised reconstructions, multi-modal image contrastive learning, and image-text contrastive learning to learn a shared representation of multiple modalities. Through evaluation using 14 benchmark datasets, EyeCLIP can be transferred to a wide range of downstream tasks involving ocular and systemic diseases, achieving state-of-the-art performance in disease classification, visual question answering, and cross-modal retrieval. EyeCLIP represents a significant advancement over previous methods, especially showcasing few-shot, even zero-shot capabilities in real-world long-tail scenarios.
Abstract:Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF images. We introduce an autoregressive GAN for smooth, memory-saving frame-by-frame FFA synthesis. To enhance the focus on dynamic lesion changes in FFA regions, we design a knowledge mask based on clinical experience. Leveraging this mask, our approach integrates innovative knowledge mask-guided techniques, including knowledge-boosted attention, knowledge-aware discriminators, and mask-enhanced patchNCE loss, aimed at refining generation in critical areas and addressing the pixel misalignment challenge. Our method achieves the best FVD of 1503.21 and PSNR of 11.81 compared to other common video generation approaches. Human assessment by an ophthalmologist confirms its high generation quality. Notably, our knowledge mask surpasses supervised lesion segmentation masks, offering a promising non-invasive alternative to traditional FFA for research and clinical applications. The code is available at https://github.com/Michi-3000/Fundus2Video.
Abstract:Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separate weights for each imaging modality, preventing a comprehensive representation of multi-modal features. This highlights the need for versatile foundation models capable of handling various tasks and modalities in ophthalmology. To address this gap, we present EyeFound, a multimodal foundation model for ophthalmic images. Unlike existing models, EyeFound learns generalizable representations from unlabeled multimodal retinal images, enabling efficient model adaptation across multiple applications. Trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities, EyeFound facilitates generalist representations and diverse multimodal downstream tasks, even for detecting challenging rare diseases. It outperforms previous work RETFound in diagnosing eye diseases, predicting systemic disease incidents, and zero-shot multimodal VQA. EyeFound provides a generalizable solution to improve model performance and lessen the annotation burden on experts, facilitating widespread clinical AI applications for retinal imaging.
Abstract:Part segmentation is a crucial task for 3D curvilinear structures like neuron dendrites and blood vessels, enabling the analysis of dendritic spines and aneurysms with scientific and clinical significance. However, their diversely winded morphology poses a generalization challenge to existing deep learning methods, which leads to labor-intensive manual correction. In this work, we propose FreSeg, a framework of part segmentation tasks for 3D curvilinear structures. With Frenet-Frame-based point cloud transformation, it enables the models to learn more generalizable features and have significant performance improvements on tasks involving elongated and curvy geometries. We evaluate FreSeg on 2 datasets: 1) DenSpineEM, an in-house dataset for dendritic spine segmentation, and 2) IntrA, a public 3D dataset for intracranial aneurysm segmentation. Further, we will release the DenSpineEM dataset, which includes roughly 6,000 spines from 69 dendrites from 3 public electron microscopy (EM) datasets, to foster the development of effective dendritic spine instance extraction methods and, consequently, large-scale connectivity analysis to better understand mammalian brains.
Abstract:In the realm of healthcare, the challenges of copyright protection and unauthorized third-party misuse are increasingly significant. Traditional methods for data copyright protection are applied prior to data distribution, implying that models trained on these data become uncontrollable. This paper introduces a novel approach, named DataCook, designed to safeguard the copyright of healthcare data during the deployment phase. DataCook operates by "cooking" the raw data before distribution, enabling the development of models that perform normally on this processed data. However, during the deployment phase, the original test data must be also "cooked" through DataCook to ensure normal model performance. This process grants copyright holders control over authorization during the deployment phase. The mechanism behind DataCook is by crafting anti-adversarial examples (AntiAdv), which are designed to enhance model confidence, as opposed to standard adversarial examples (Adv) that aim to confuse models. Similar to Adv, AntiAdv introduces imperceptible perturbations, ensuring that the data processed by DataCook remains easily understandable. We conducted extensive experiments on MedMNIST datasets, encompassing both 2D/3D data and the high-resolution variants. The outcomes indicate that DataCook effectively meets its objectives, preventing models trained on AntiAdv from analyzing unauthorized data effectively, without compromising the validity and accuracy of the data in legitimate scenarios. Code and data are available at https://github.com/MedMNIST/DataCook.
Abstract:Data generated in clinical practice often exhibits biases, such as long-tail imbalance and algorithmic unfairness. This study aims to mitigate these challenges through data synthesis. Previous efforts in medical imaging synthesis have struggled with separating lesion information from background context, leading to difficulties in generating high-quality backgrounds and limited control over the synthetic output. Inspired by diffusion-based image inpainting, we propose LeFusion, lesion-focused diffusion models. By redesigning the diffusion learning objectives to concentrate on lesion areas, it simplifies the model learning process and enhance the controllability of the synthetic output, while preserving background by integrating forward-diffused background contexts into the reverse diffusion process. Furthermore, we generalize it to jointly handle multi-class lesions, and further introduce a generative model for lesion masks to increase synthesis diversity. Validated on the DE-MRI cardiac lesion segmentation dataset (Emidec), our methodology employs the popular nnUNet to demonstrate that the synthetic data make it possible to effectively enhance a state-of-the-art model. Code and model are available at https://github.com/M3DV/LeFusion.
Abstract:Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website. As an independent contribution, we have also extended our previous internal baseline by incorporating recent advancements in large-scale pretrained networks and point-based rib segmentation techniques. The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis.
Abstract:Pulmonary diseases rank prominently among the principal causes of death worldwide. Curing them will require, among other things, a better understanding of the many complex 3D tree-shaped structures within the pulmonary system, such as airways, arteries, and veins. In theory, they can be modeled using high-resolution image stacks. Unfortunately, standard CNN approaches operating on dense voxel grids are prohibitively expensive. To remedy this, we introduce a point-based approach that preserves graph connectivity of tree skeleton and incorporates an implicit surface representation. It delivers SOTA accuracy at a low computational cost and the resulting models have usable surfaces. Due to the scarcity of publicly accessible data, we have also curated an extensive dataset to evaluate our approach and will make it public.