Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mohammad Amin Khalafi

Vision Language Models versus Machine Learning Models Performance on Polyp Detection and Classification in Colonoscopy Images

Mar 27, 2025

Mohammad Amin Khalafi, Seyed Amir Ahmad Safavi-Naini, Ameneh Salehi, Nariman Naderi, Dorsa Alijanzadeh, Pardis Ketabi Moghadam, Kaveh Kavosi, Negar Golestani, Shabnam Shahrokh, Soltanali Fallah(+6 more)

Abstract:Introduction: This study provides a comprehensive performance assessment of vision-language models (VLMs) against established convolutional neural networks (CNNs) and classic machine learning models (CMLs) for computer-aided detection (CADe) and computer-aided diagnosis (CADx) of colonoscopy polyp images. Method: We analyzed 2,258 colonoscopy images with corresponding pathology reports from 428 patients. We preprocessed all images using standardized techniques (resizing, normalization, and augmentation) and implemented a rigorous comparative framework evaluating 11 distinct models: ResNet50, 4 CMLs (random forest, support vector machine, logistic regression, decision tree), two specialized contrastive vision language encoders (CLIP, BiomedCLIP), and three general-purpose VLMs ( GPT-4 Gemini-1.5-Pro, Claude-3-Opus). Our performance assessment focused on two clinical tasks: polyp detection (CADe) and classification (CADx). Result: In polyp detection, ResNet50 achieved the best performance (F1: 91.35%, AUROC: 0.98), followed by BiomedCLIP (F1: 88.68%, AUROC: [AS1] ). GPT-4 demonstrated comparable effectiveness to traditional machine learning approaches (F1: 81.02%, AUROC: [AS2] ), outperforming other general-purpose VLMs. For polyp classification, performance rankings remained consistent but with lower overall metrics. ResNet50 maintained the highest efficacy (weighted F1: 74.94%), while GPT-4 demonstrated moderate capability (weighted F1: 41.18%), significantly exceeding other VLMs (Claude-3-Opus weighted F1: 25.54%, Gemini 1.5 Pro weighted F1: 6.17%). Conclusion: CNNs remain superior for both CADx and CADe tasks. However, VLMs like BioMedCLIP and GPT-4 may be useful for polyp detection tasks where training CNNs is not feasible.

* Code is available at: https://github.com/aminkhalafi/CML-vs-LLM-on-Polyp-Detection. CoI: AlSo serves on the advisory board and holds equity in Virgo Surgical Solutions. The other authors declare no conflicts of interest. Data

Via

Access Paper or Ask Questions

AI-Driven Non-Invasive Detection and Staging of Steatosis in Fatty Liver Disease Using a Novel Cascade Model and Information Fusion Techniques

Dec 06, 2024

Niloufar Delfan, Pardis Ketabi Moghadam, Mohammad Khoshnevisan, Mehdi Hosseini Chagahi, Behzad Hatami, Melika Asgharzadeh, Mohammadreza Zali, Behzad Moshiri, Amin Momeni Moghaddam, Mohammad Amin Khalafi(+1 more)

Figure 1 for AI-Driven Non-Invasive Detection and Staging of Steatosis in Fatty Liver Disease Using a Novel Cascade Model and Information Fusion Techniques

Figure 2 for AI-Driven Non-Invasive Detection and Staging of Steatosis in Fatty Liver Disease Using a Novel Cascade Model and Information Fusion Techniques

Figure 3 for AI-Driven Non-Invasive Detection and Staging of Steatosis in Fatty Liver Disease Using a Novel Cascade Model and Information Fusion Techniques

Figure 4 for AI-Driven Non-Invasive Detection and Staging of Steatosis in Fatty Liver Disease Using a Novel Cascade Model and Information Fusion Techniques

Abstract:Non-alcoholic fatty liver disease (NAFLD) is one of the most widespread liver disorders on a global scale, posing a significant threat of progressing to more severe conditions like nonalcoholic steatohepatitis (NASH), liver fibrosis, cirrhosis, and hepatocellular carcinoma. Diagnosing and staging NAFLD presents challenges due to its non-specific symptoms and the invasive nature of liver biopsies. Our research introduces a novel artificial intelligence cascade model employing ensemble learning and feature fusion techniques. We developed a non-invasive, robust, and reliable diagnostic artificial intelligence tool that utilizes anthropometric and laboratory parameters, facilitating early detection and intervention in NAFLD progression. Our novel artificial intelligence achieved an 86% accuracy rate for the NASH steatosis staging task (non-NASH, steatosis grade 1, steatosis grade 2, and steatosis grade 3) and an impressive 96% AUC-ROC for distinguishing between NASH (steatosis grade 1, grade 2, and grade3) and non-NASH cases, outperforming current state-of-the-art models. This notable improvement in diagnostic performance underscores the potential application of artificial intelligence in the early diagnosis and treatment of NAFLD, leading to better patient outcomes and a reduced healthcare burden associated with advanced liver disease.

Via

Access Paper or Ask Questions

Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data

Sep 02, 2024

Mohammadreza Ghaffarzadeh-Esfahani, Mahdi Ghaffarzadeh-Esfahani, Arian Salahi-Niri, Hossein Toreyhi, Zahra Atf, Amirali Mohsenzadeh-Kermani, Mahshad Sarikhani, Zohreh Tajabadi, Fatemeh Shojaeian, Mohammad Hassan Bagheri(+32 more)

Figure 1 for Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data

Figure 2 for Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data

Figure 3 for Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data

Figure 4 for Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data

Abstract:Background: This study aimed to evaluate and compare the performance of classical machine learning models (CMLs) and large language models (LLMs) in predicting mortality associated with COVID-19 by utilizing a high-dimensional tabular dataset. Materials and Methods: We analyzed data from 9,134 COVID-19 patients collected across four hospitals. Seven CML models, including XGBoost and random forest (RF), were trained and evaluated. The structured data was converted into text for zero-shot classification by eight LLMs, including GPT-4 and Mistral-7b. Additionally, Mistral-7b was fine-tuned using the QLoRA approach to enhance its predictive capabilities. Results: Among the CML models, XGBoost and RF achieved the highest accuracy, with F1 scores of 0.87 for internal validation and 0.83 for external validation. In the LLM category, GPT-4 was the top performer with an F1 score of 0.43. Fine-tuning Mistral-7b significantly improved its recall from 1% to 79%, resulting in an F1 score of 0.74, which was stable during external validation. Conclusion: While LLMs show moderate performance in zero-shot classification, fine-tuning can significantly enhance their effectiveness, potentially aligning them closer to CML models. However, CMLs still outperform LLMs in high-dimensional tabular data tasks.

* Code is available at: https://github.com/mohammad-gh009/Large-Language-Models-vs-Classical-Machine-learning and https://github.com/Sdamirsa/Tehran_COVID_Cohort. The datasets are available from the corresponding author on reasonable request (sdamirsa@ymail.com)

Via

Access Paper or Ask Questions