Abstract:Non-alcoholic fatty liver disease (NAFLD) is one of the most widespread liver disorders on a global scale, posing a significant threat of progressing to more severe conditions like nonalcoholic steatohepatitis (NASH), liver fibrosis, cirrhosis, and hepatocellular carcinoma. Diagnosing and staging NAFLD presents challenges due to its non-specific symptoms and the invasive nature of liver biopsies. Our research introduces a novel artificial intelligence cascade model employing ensemble learning and feature fusion techniques. We developed a non-invasive, robust, and reliable diagnostic artificial intelligence tool that utilizes anthropometric and laboratory parameters, facilitating early detection and intervention in NAFLD progression. Our novel artificial intelligence achieved an 86% accuracy rate for the NASH steatosis staging task (non-NASH, steatosis grade 1, steatosis grade 2, and steatosis grade 3) and an impressive 96% AUC-ROC for distinguishing between NASH (steatosis grade 1, grade 2, and grade3) and non-NASH cases, outperforming current state-of-the-art models. This notable improvement in diagnostic performance underscores the potential application of artificial intelligence in the early diagnosis and treatment of NAFLD, leading to better patient outcomes and a reduced healthcare burden associated with advanced liver disease.
Abstract:Background: This study aimed to evaluate and compare the performance of classical machine learning models (CMLs) and large language models (LLMs) in predicting mortality associated with COVID-19 by utilizing a high-dimensional tabular dataset. Materials and Methods: We analyzed data from 9,134 COVID-19 patients collected across four hospitals. Seven CML models, including XGBoost and random forest (RF), were trained and evaluated. The structured data was converted into text for zero-shot classification by eight LLMs, including GPT-4 and Mistral-7b. Additionally, Mistral-7b was fine-tuned using the QLoRA approach to enhance its predictive capabilities. Results: Among the CML models, XGBoost and RF achieved the highest accuracy, with F1 scores of 0.87 for internal validation and 0.83 for external validation. In the LLM category, GPT-4 was the top performer with an F1 score of 0.43. Fine-tuning Mistral-7b significantly improved its recall from 1% to 79%, resulting in an F1 score of 0.74, which was stable during external validation. Conclusion: While LLMs show moderate performance in zero-shot classification, fine-tuning can significantly enhance their effectiveness, potentially aligning them closer to CML models. However, CMLs still outperform LLMs in high-dimensional tabular data tasks.