Abstract:Computed Tomography (CT) plays a pivotal role in medical diagnosis; however, variability across reconstruction kernels hinders data-driven approaches, such as deep learning models, from achieving reliable and generalized performance. To this end, CT data harmonization has emerged as a promising solution to minimize such non-biological variances by standardizing data across different sources or conditions. In this context, Generative Adversarial Networks (GANs) have proved to be a powerful framework for harmonization, framing it as a style-transfer problem. However, GAN-based approaches still face limitations in capturing complex relationships within the images, which are essential for effective harmonization. In this work, we propose a novel texture-aware StarGAN for CT data harmonization, enabling one-to-many translations across different reconstruction kernels. Although the StarGAN model has been successfully applied in other domains, its potential for CT data harmonization remains unexplored. Furthermore, our approach introduces a multi-scale texture loss function that embeds texture information across different spatial and angular scales into the harmonization process, effectively addressing kernel-induced texture variations. We conducted extensive experimentation on a publicly available dataset, utilizing a total of 48667 chest CT slices from 197 patients distributed over three different reconstruction kernels, demonstrating the superiority of our method over the baseline StarGAN.
Abstract:Generating positron emission tomography (PET) images from computed tomography (CT) scans via deep learning offers a promising pathway to reduce radiation exposure and costs associated with PET imaging, improving patient care and accessibility to functional imaging. Whole-body image translation presents challenges due to anatomical heterogeneity, often limiting generalized models. We propose a framework that segments whole-body CT images into four regions-head, trunk, arms, and legs-and uses district-specific Generative Adversarial Networks (GANs) for tailored CT-to-PET translation. Synthetic PET images from each region are stitched together to reconstruct the whole-body scan. Comparisons with a baseline non-segmented GAN and experiments with Pix2Pix and CycleGAN architectures tested paired and unpaired scenarios. Quantitative evaluations at district, whole-body, and lesion levels demonstrated significant improvements with our district-specific GANs. Pix2Pix yielded superior metrics, ensuring precise, high-quality image synthesis. By addressing anatomical heterogeneity, this approach achieves state-of-the-art results in whole-body CT-to-PET translation. This methodology supports healthcare Digital Twins by enabling accurate virtual PET scans from CT data, creating virtual imaging representations to monitor, predict, and optimize health outcomes.
Abstract:Non-small cell lung cancer (NSCLC) remains a major global health challenge, with high post-surgical recurrence rates underscoring the need for accurate pathological response predictions to guide personalized treatments. Although artificial intelligence models show promise in this domain, their clinical adoption is limited by the lack of medically grounded guidance during training, often resulting in non-explainable intrinsic predictions. To address this, we propose Doctor-in-the-Loop, a novel framework that integrates expert-driven domain knowledge with explainable artificial intelligence techniques, directing the model toward clinically relevant anatomical regions and improving both interpretability and trustworthiness. Our approach employs a gradual multi-view strategy, progressively refining the model's focus from broad contextual features to finer, lesion-specific details. By incorporating domain insights at every stage, we enhance predictive accuracy while ensuring that the model's decision-making process aligns more closely with clinical reasoning. Evaluated on a dataset of NSCLC patients, Doctor-in-the-Loop delivers promising predictive performance and provides transparent, justifiable outputs, representing a significant step toward clinically explainable artificial intelligence in oncology.
Abstract:Full-Field Digital Mammography (FFDM) is the primary imaging modality for routine breast cancer screening; however, its effectiveness is limited in patients with dense breast tissue or fibrocystic conditions. Contrast-Enhanced Spectral Mammography (CESM), a second-level imaging technique, offers enhanced accuracy in tumor detection. Nonetheless, its application is restricted due to higher radiation exposure, the use of contrast agents, and limited accessibility. As a result, CESM is typically reserved for select cases, leaving many patients to rely solely on FFDM despite the superior diagnostic performance of CESM. While biopsy remains the gold standard for definitive diagnosis, it is an invasive procedure that can cause discomfort for patients. We introduce a multimodal, multi-view deep learning approach for virtual biopsy, integrating FFDM and CESM modalities in craniocaudal and mediolateral oblique views to classify lesions as malignant or benign. To address the challenge of missing CESM data, we leverage generative artificial intelligence to impute CESM images from FFDM scans. Experimental results demonstrate that incorporating the CESM modality is crucial to enhance the performance of virtual biopsy. When real CESM data is missing, synthetic CESM images proved effective, outperforming the use of FFDM alone, particularly in multimodal configurations that combine FFDM and CESM modalities. The proposed approach has the potential to improve diagnostic workflows, providing clinicians with augmented intelligence tools to improve diagnostic accuracy and patient care. Additionally, as a contribution to the research community, we publicly release the dataset used in our experiments, facilitating further advancements in this field.
Abstract:Artificial Intelligence is revolutionizing medical practice, enhancing diagnostic accuracy and healthcare delivery. However, its adaptation in medical settings still faces significant challenges, related to data availability and privacy constraints. Synthetic data has emerged as a promising solution to mitigate these issues, addressing data scarcity while preserving privacy. Recently, Latent Diffusion Models have emerged as a powerful tool for generating high-quality synthetic data. Meanwhile, the integration of different modalities has gained interest, emphasizing the need of models capable of handle multimodal medical data.Existing approaches struggle to integrate complementary information and lack the ability to generate modalities simultaneously. To address this challenge, we present MedCoDi-M, a 6.77-billion-parameter model, designed for multimodal medical data generation, that, following Foundation Model paradigm, exploits contrastive learning and large quantity of data to build a shared latent space which capture the relationships between different data modalities. Further, we introduce the Multi-Prompt training technique, which significantly boosts MedCoDi-M's generation under different settings. We extensively validate MedCoDi-M: first we benchmark it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we assess the utility of MedCoDi-M in addressing key challenges in the medical field, such as anonymization, data scarcity and imbalance learning. The results are promising, demonstrating the applicability of MedCoDi-M in medical contexts. Project page is at https://cosbidev.github.io/MedCoDi-M/.
Abstract:In healthcare, the integration of multimodal data is pivotal for developing comprehensive diagnostic and predictive models. However, managing missing data remains a significant challenge in real-world applications. We introduce MARIA (Multimodal Attention Resilient to Incomplete datA), a novel transformer-based deep learning model designed to address these challenges through an intermediate fusion strategy. Unlike conventional approaches that depend on imputation, MARIA utilizes a masked self-attention mechanism, which processes only the available data without generating synthetic values. This approach enables it to effectively handle incomplete datasets, enhancing robustness and minimizing biases introduced by imputation methods. We evaluated MARIA against 10 state-of-the-art machine learning and deep learning models across 8 diagnostic and prognostic tasks. The results demonstrate that MARIA outperforms existing methods in terms of performance and resilience to varying levels of data incompleteness, underscoring its potential for critical healthcare applications.
Abstract:Alzheimer's disease (AD) poses significant global health challenges due to its increasing prevalence and associated societal costs. Early detection and diagnosis of AD are critical for delaying progression and improving patient outcomes. Traditional diagnostic methods and single-modality data often fall short in identifying early-stage AD and distinguishing it from Mild Cognitive Impairment (MCI). This study addresses these challenges by introducing a novel approach: multImodal enseMble via class BALancing diversity for iMbalancEd Data (IMBALMED). IMBALMED integrates multimodal data from the Alzheimer's Disease Neuroimaging Initiative database, including clinical assessments, neuroimaging phenotypes, biospecimen and subject characteristics data. It employs an ensemble of model classifiers, each trained with different class balancing techniques, to overcome class imbalance and enhance model accuracy. We evaluate IMBALMED on two diagnostic tasks (binary and ternary classification) and four binary early detection tasks (at 12, 24, 36, and 48 months), comparing its performance with state-of-the-art algorithms and an unbalanced dataset method. IMBALMED demonstrates superior diagnostic accuracy and predictive performance in both binary and ternary classification tasks, significantly improving early detection of MCI at 48-month time point. The method shows improved classification performance and robustness, offering a promising solution for early detection and management of AD.
Abstract:Deep learning has revolutionized biomedical research by providing sophisticated methods to handle complex, high-dimensional data. Multimodal deep learning (MDL) further enhances this capability by integrating diverse data types such as imaging, textual data, and genetic information, leading to more robust and accurate predictive models. In MDL, differently from early and late fusion methods, intermediate fusion stands out for its ability to effectively combine modality-specific features during the learning process. This systematic review aims to comprehensively analyze and formalize current intermediate fusion methods in biomedical applications. We investigate the techniques employed, the challenges faced, and potential future directions for advancing intermediate fusion methods. Additionally, we introduce a structured notation to enhance the understanding and application of these methods beyond the biomedical domain. Our findings are intended to support researchers, healthcare professionals, and the broader deep learning community in developing more sophisticated and insightful multimodal models. Through this review, we aim to provide a foundational framework for future research and practical applications in the dynamic field of MDL.
Abstract:Long-tailed data is a special type of multi-class imbalanced data with a very large amount of minority/tail classes that have a very significant combined influence. Long-tailed learning aims to build high-performance models on datasets with long-tailed distributions, which can identify all the classes with high accuracy, in particular the minority/tail classes. It is a cutting-edge research direction that has attracted a remarkable amount of research effort in the past few years. In this paper, we present a comprehensive survey of latest advances in long-tailed visual learning. We first propose a new taxonomy for long-tailed learning, which consists of eight different dimensions, including data balancing, neural architecture, feature enrichment, logits adjustment, loss function, bells and whistles, network optimization, and post hoc processing techniques. Based on our proposed taxonomy, we present a systematic review of long-tailed learning methods, discussing their commonalities and alignable differences. We also analyze the differences between imbalance learning and long-tailed learning approaches. Finally, we discuss prospects and future directions in this field.
Abstract:Handling missing values in tabular datasets presents a significant challenge in training and testing artificial intelligence models, an issue usually addressed using imputation techniques. Here we introduce "Not Another Imputation Method" (NAIM), a novel transformer-based model specifically designed to address this issue without the need for traditional imputation techniques. NAIM employs feature-specific embeddings and a masked self-attention mechanism that effectively learns from available data, thus avoiding the necessity to impute missing values. Additionally, a novel regularization technique is introduced to enhance the model's generalization capability from incomplete data. We extensively evaluated NAIM on 5 publicly available tabular datasets, demonstrating its superior performance over 6 state-of-the-art machine learning models and 4 deep learning models, each paired with 3 different imputation techniques when necessary. The results highlight the efficacy of NAIM in improving predictive performance and resilience in the presence of missing data. To facilitate further research and practical application in handling missing data without traditional imputation methods, we made the code for NAIM available at https://github.com/cosbidev/NAIM.