Abstract:Generating consistent multiple views for 3D reconstruction tasks is still a challenge to existing image-to-3D diffusion models. Generally, incorporating 3D representations into diffusion model decrease the model's speed as well as generalizability and quality. This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model. In the model, we introduce epipolar geometry constraints and multi-view attention to enforce 3D consistency. From as few as one image input, our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
Abstract:In this work, we present X-Diffusion, a cross-sectional diffusion model tailored for Magnetic Resonance Imaging (MRI) data. X-Diffusion is capable of generating the entire MRI volume from just a single MRI slice or optionally from few multiple slices, setting new benchmarks in the precision of synthesized MRIs from extremely sparse observations. The uniqueness lies in the novel view-conditional training and inference of X-Diffusion on MRI volumes, allowing for generalized MRI learning. Our evaluations span both brain tumour MRIs from the BRATS dataset and full-body MRIs from the UK Biobank dataset. Utilizing the paired pre-registered Dual-energy X-ray Absorptiometry (DXA) and MRI modalities in the UK Biobank dataset, X-Diffusion is able to generate detailed 3D MRI volume from a single full-body DXA. Remarkably, the resultant MRIs not only stand out in precision on unseen examples (surpassing state-of-the-art results by large margins) but also flawlessly retain essential features of the original MRI, including tumour profiles, spine curvature, brain volume, and beyond. Furthermore, the trained X-Diffusion model on the MRI datasets attains a generalization capacity out-of-domain (e.g. generating knee MRIs even though it is trained on brains). The code is available on the project website https://emmanuelleb985.github.io/XDiffusion/ .
Abstract:Segmentation of head and neck (H\&N) tumours and prediction of patient outcome are crucial for patient's disease diagnosis and treatment monitoring. Current developments of robust deep learning models are hindered by the lack of large multi-centre, multi-modal data with quality annotations. The MICCAI 2021 HEad and neCK TumOR (HECKTOR) segmentation and outcome prediction challenge creates a platform for comparing segmentation methods of the primary gross target volume on fluoro-deoxyglucose (FDG)-PET and Computed Tomography images and prediction of progression-free survival in H\&N oropharyngeal cancer.For the segmentation task, we proposed a new network based on an encoder-decoder architecture with full inter- and intra-skip connections to take advantage of low-level and high-level semantics at full scales. Additionally, we used Conditional Random Fields as a post-processing step to refine the predicted segmentation maps. We trained multiple neural networks for tumor volume segmentation, and these segmentations were ensembled achieving an average Dice Similarity Coefficient of 0.75 in cross-validation, and 0.76 on the challenge testing data set. For prediction of patient progression free survival task, we propose a Cox proportional hazard regression combining clinical, radiomic, and deep learning features. Our survival prediction model achieved a concordance index of 0.82 in cross-validation, and 0.62 on the challenge testing data set.