Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rosa Sicilia

Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages

May 02, 2025

Marco Salmè, Rosa Sicilia, Paolo Soda, Valerio Guarrasi

Abstract:The integration of artificial intelligence in healthcare has opened new horizons for improving medical diagnostics and patient care. However, challenges persist in developing systems capable of generating accurate and contextually relevant radiology reports, particularly in low-resource languages. In this study, we present a comprehensive benchmark to evaluate the performance of instruction-tuned Vision-Language Models (VLMs) in the specialized task of radiology report generation across three low-resource languages: Italian, German, and Spanish. Employing the LLaVA architectural framework, we conducted a systematic evaluation of pre-trained models utilizing general datasets, domain-specific datasets, and low-resource language-specific datasets. In light of the unavailability of models that possess prior knowledge of both the medical domain and low-resource languages, we analyzed various adaptations to determine the most effective approach for these contexts. The results revealed that language-specific models substantially outperformed both general and domain-specific models in generating radiology reports, emphasizing the critical role of linguistic adaptation. Additionally, models fine-tuned with medical terminology exhibited enhanced performance across all languages compared to models with generic knowledge, highlighting the importance of domain-specific training. We also explored the influence of the temperature parameter on the coherence of report generation, providing insights for optimal model settings. Our findings highlight the importance of tailored language and domain-specific training for improving the quality and accuracy of radiological reports in multilingual settings. This research not only advances our understanding of VLMs adaptability in healthcare but also points to significant avenues for future investigations into model tuning and language-specific adaptations.

Via

Access Paper or Ask Questions

Beyond the Generative Learning Trilemma: Generative Model Assessment in Data Scarcity Domains

Apr 14, 2025

Marco Salmè, Lorenzo Tronchin, Rosa Sicilia, Paolo Soda, Valerio Guarrasi

Figure 1 for Beyond the Generative Learning Trilemma: Generative Model Assessment in Data Scarcity Domains

Figure 2 for Beyond the Generative Learning Trilemma: Generative Model Assessment in Data Scarcity Domains

Figure 3 for Beyond the Generative Learning Trilemma: Generative Model Assessment in Data Scarcity Domains

Figure 4 for Beyond the Generative Learning Trilemma: Generative Model Assessment in Data Scarcity Domains

Abstract:Data scarcity remains a critical bottleneck impeding technological advancements across various domains, including but not limited to medicine and precision agriculture. To address this challenge, we explore the potential of Deep Generative Models (DGMs) in producing synthetic data that satisfies the Generative Learning Trilemma: fidelity, diversity, and sampling efficiency. However, recognizing that these criteria alone are insufficient for practical applications, we extend the trilemma to include utility, robustness, and privacy, factors crucial for ensuring the applicability of DGMs in real-world scenarios. Evaluating these metrics becomes particularly challenging in data-scarce environments, as DGMs traditionally rely on large datasets to perform optimally. This limitation is especially pronounced in domains like medicine and precision agriculture, where ensuring acceptable model performance under data constraints is vital. To address these challenges, we assess the Generative Learning Trilemma in data-scarcity settings using state-of-the-art evaluation metrics, comparing three prominent DGMs: Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Diffusion Models (DMs). Furthermore, we propose a comprehensive framework to assess utility, robustness, and privacy in synthetic data generated by DGMs. Our findings demonstrate varying strengths among DGMs, with each model exhibiting unique advantages based on the application context. This study broadens the scope of the Generative Learning Trilemma, aligning it with real-world demands and providing actionable guidance for selecting DGMs tailored to specific applications.

Via

Access Paper or Ask Questions

Class Balancing Diversity Multimodal Ensemble for Alzheimer's Disease Diagnosis and Early Detection

Oct 14, 2024

Arianna Francesconi, Lazzaro di Biase, Donato Cappetta, Fabio Rebecchi, Paolo Soda, Rosa Sicilia, Valerio Guarrasi

Figure 1 for Class Balancing Diversity Multimodal Ensemble for Alzheimer's Disease Diagnosis and Early Detection

Figure 2 for Class Balancing Diversity Multimodal Ensemble for Alzheimer's Disease Diagnosis and Early Detection

Figure 3 for Class Balancing Diversity Multimodal Ensemble for Alzheimer's Disease Diagnosis and Early Detection

Figure 4 for Class Balancing Diversity Multimodal Ensemble for Alzheimer's Disease Diagnosis and Early Detection

Abstract:Alzheimer's disease (AD) poses significant global health challenges due to its increasing prevalence and associated societal costs. Early detection and diagnosis of AD are critical for delaying progression and improving patient outcomes. Traditional diagnostic methods and single-modality data often fall short in identifying early-stage AD and distinguishing it from Mild Cognitive Impairment (MCI). This study addresses these challenges by introducing a novel approach: multImodal enseMble via class BALancing diversity for iMbalancEd Data (IMBALMED). IMBALMED integrates multimodal data from the Alzheimer's Disease Neuroimaging Initiative database, including clinical assessments, neuroimaging phenotypes, biospecimen and subject characteristics data. It employs an ensemble of model classifiers, each trained with different class balancing techniques, to overcome class imbalance and enhance model accuracy. We evaluate IMBALMED on two diagnostic tasks (binary and ternary classification) and four binary early detection tasks (at 12, 24, 36, and 48 months), comparing its performance with state-of-the-art algorithms and an unbalanced dataset method. IMBALMED demonstrates superior diagnostic accuracy and predictive performance in both binary and ternary classification tasks, significantly improving early detection of MCI at 48-month time point. The method shows improved classification performance and robustness, offering a promising solution for early detection and management of AD.

Via

Access Paper or Ask Questions

RadioPathomics: Multimodal Learning in Non-Small Cell Lung Cancer for Adaptive Radiotherapy

Apr 26, 2022

Matteo Tortora, Ermanno Cordelli, Rosa Sicilia, Lorenzo Nibid, Edy Ippolito, Giuseppe Perrone, Sara Ramella, Paolo Soda

Figure 1 for RadioPathomics: Multimodal Learning in Non-Small Cell Lung Cancer for Adaptive Radiotherapy

Figure 2 for RadioPathomics: Multimodal Learning in Non-Small Cell Lung Cancer for Adaptive Radiotherapy

Figure 3 for RadioPathomics: Multimodal Learning in Non-Small Cell Lung Cancer for Adaptive Radiotherapy

Figure 4 for RadioPathomics: Multimodal Learning in Non-Small Cell Lung Cancer for Adaptive Radiotherapy

Abstract:The current cancer treatment practice collects multimodal data, such as radiology images, histopathology slides, genomics and clinical data. The importance of these data sources taken individually has fostered the recent raise of radiomics and pathomics, i.e. the extraction of quantitative features from radiology and histopathology images routinely collected to predict clinical outcomes or to guide clinical decisions using artificial intelligence algorithms. Nevertheless, how to combine them into a single multimodal framework is still an open issue. In this work we therefore develop a multimodal late fusion approach that combines hand-crafted features computed from radiomics, pathomics and clinical data to predict radiation therapy treatment outcomes for non-small-cell lung cancer patients. Within this context, we investigate eight different late fusion rules (i.e. product, maximum, minimum, mean, decision template, Dempster-Shafer, majority voting, and confidence rule) and two patient-wise aggregation rules leveraging the richness of information given by computer tomography images and whole-slide scans. The experiments in leave-one-patient-out cross-validation on an in-house cohort of 33 patients show that the proposed multimodal paradigm with an AUC equal to $90.9\%$ outperforms each unimodal approach, suggesting that data integration can advance precision medicine. As a further contribution, we also compare the hand-crafted representations with features automatically computed by deep networks, and the late fusion paradigm with early fusion, another popular multimodal approach. In both cases, the experiments show that the proposed multimodal approach provides the best results.

Via

Access Paper or Ask Questions

AIforCOVID: predicting the clinical outcomes in patients with COVID-19 applying AI to chest-X-rays. An Italian multicentre study

Dec 11, 2020

Paolo Soda, Natascha Claudia D'Amico, Jacopo Tessadori, Giovanni Valbusa, Valerio Guarrasi, Chandra Bortolotto, Muhammad Usman Akbar, Rosa Sicilia, Ermanno Cordelli, Deborah Fazzini(+18 more)

Figure 1 for AIforCOVID: predicting the clinical outcomes in patients with COVID-19 applying AI to chest-X-rays. An Italian multicentre study

Figure 2 for AIforCOVID: predicting the clinical outcomes in patients with COVID-19 applying AI to chest-X-rays. An Italian multicentre study

Figure 3 for AIforCOVID: predicting the clinical outcomes in patients with COVID-19 applying AI to chest-X-rays. An Italian multicentre study

Figure 4 for AIforCOVID: predicting the clinical outcomes in patients with COVID-19 applying AI to chest-X-rays. An Italian multicentre study

Abstract:Recent epidemiological data report that worldwide more than 53 million people have been infected by SARS-CoV-2, resulting in 1.3 million deaths. The disease has been spreading very rapidly and few months after the identification of the first infected, shortage of hospital resources quickly became a problem. In this work we investigate whether chest X-ray (CXR) can be used as a possible tool for the early identification of patients at risk of severe outcome, like intensive care or death. CXR is a radiological technique that compared to computed tomography (CT) it is simpler, faster, more widespread and it induces lower radiation dose. We present a dataset including data collected from 820 patients by six Italian hospitals in spring 2020 during the first COVID-19 emergency. The dataset includes CXR images, several clinical attributes and clinical outcomes. We investigate the potential of artificial intelligence to predict the prognosis of such patients, distinguishing between severe and mild cases, thus offering a baseline reference for other researchers and practitioners. To this goal, we present three approaches that use features extracted from CXR images, either handcrafted or automatically by convolutional neuronal networks, which are then integrated with the clinical data. Exhaustive evaluation shows promising performance both in 10-fold and leave-one-centre-out cross-validation, implying that clinical data and images have the potential to provide useful information for the management of patients and hospital resources.

Via

Access Paper or Ask Questions