Abstract:This study evaluates metrics for tasks such as classification, regression, clustering, correlation analysis, statistical tests, segmentation, and image-to-image (I2I) translation. Metrics were compared across Python libraries, R packages, and Matlab functions to assess their consistency and highlight discrepancies. The findings underscore the need for a unified roadmap to standardize metrics, ensuring reliable and reproducible ML evaluations across platforms. This study examined a wide range of evaluation metrics across various tasks and found only some to be consistent across platforms, such as (i) Accuracy, Balanced Accuracy, Cohens Kappa, F-beta Score, MCC, Geometric Mean, AUC, and Log Loss in binary classification; (ii) Accuracy, Cohens Kappa, and F-beta Score in multi-class classification; (iii) MAE, MSE, RMSE, MAPE, Explained Variance, Median AE, MSLE, and Huber in regression; (iv) Davies-Bouldin Index and Calinski-Harabasz Index in clustering; (v) Pearson, Spearman, Kendall's Tau, Mutual Information, Distance Correlation, Percbend, Shepherd, and Partial Correlation in correlation analysis; (vi) Paired t-test, Chi-Square Test, ANOVA, Kruskal-Wallis Test, Shapiro-Wilk Test, Welchs t-test, and Bartlett's test in statistical tests; (vii) Accuracy, Precision, and Recall in 2D segmentation; (viii) Accuracy in 3D segmentation; (ix) MAE, MSE, RMSE, and R-Squared in 2D-I2I translation; and (x) MAE, MSE, and RMSE in 3D-I2I translation. Given observation of discrepancies in a number of metrics (e.g. precision, recall and F1 score in binary classification, WCSS in clustering, multiple statistical tests, and IoU in segmentation, amongst multiple metrics), this study concludes that ML evaluation metrics require standardization and recommends that future research use consistent metrics for different tasks to effectively compare ML techniques and solutions.
Abstract:Purpose: Artificial intelligence (AI) techniques have been extensively utilized for diagnosing and prognosis of several diseases in recent years. This study identifies, appraises and synthesizes published studies on the use of AI for the prognosis of COVID-19. Method: Electronic search was performed using Medline, Google Scholar, Scopus, Embase, Cochrane and ProQuest. Studies that examined machine learning or deep learning methods to determine the prognosis of COVID-19 using CT or chest X-ray images were included. Polled sensitivity, specificity area under the curve and diagnostic odds ratio were calculated. Result: A total of 36 articles were included; various prognosis-related issues, including disease severity, mechanical ventilation or admission to the intensive care unit and mortality, were investigated. Several AI models and architectures were employed, such as the Siamense model, support vector machine, Random Forest , eXtreme Gradient Boosting, and convolutional neural networks. The models achieved 71%, 88% and 67% sensitivity for mortality, severity assessment and need for ventilation, respectively. The specificity of 69%, 89% and 89% were reported for the aforementioned variables. Conclusion: Based on the included articles, machine learning and deep learning methods used for the prognosis of COVID-19 patients using radiomic features from CT or CXR images can help clinicians manage patients and allocate resources more effectively. These studies also demonstrate that combining patient demographic, clinical data, laboratory tests and radiomic features improves model performances.
Abstract:Prostate specific membrane antigen (PSMA) positron emission tomography/computed tomography (PET/CT) imaging provides a tremendously exciting frontier in visualization of prostate cancer (PCa) metastatic lesions. However, accurate segmentation of metastatic lesions is challenging due to low signal-to-noise ratios and variable sizes, shapes, and locations of the lesions. This study proposes a novel approach for automated segmentation of metastatic lesions in PSMA PET/CT 3D volumetric images using 2D denoising diffusion probabilistic models (DDPMs). Instead of 2D trans-axial slices or 3D volumes, the proposed approach segments the lesions on generated multi-angle maximum intensity projections (MA-MIPs) of the PSMA PET images, then obtains the final 3D segmentation masks from 3D ordered subset expectation maximization (OSEM) reconstruction of 2D MA-MIPs segmentations. Our proposed method achieved superior performance compared to state-of-the-art 3D segmentation approaches in terms of accuracy and robustness in detecting and segmenting small metastatic PCa lesions. The proposed method has significant potential as a tool for quantitative analysis of metastatic burden in PCa patients.
Abstract:The objective of this study was to develop an automated pipeline that enhances thyroid disease classification using thyroid scintigraphy images, aiming to decrease assessment time and increase diagnostic accuracy. Anterior thyroid scintigraphy images from 2,643 patients were collected and categorized into diffuse goiter (DG), multinodal goiter (MNG), and thyroiditis (TH) based on clinical reports, and then segmented by an expert. A ResUNet model was trained to perform auto-segmentation. Radiomic features were extracted from both physician (scenario 1) and ResUNet segmentations (scenario 2), followed by omitting highly correlated features using Spearman's correlation, and feature selection using Recursive Feature Elimination (RFE) with XGBoost as the core. All models were trained under leave-one-center-out cross-validation (LOCOCV) scheme, where nine instances of algorithms were iteratively trained and validated on data from eight centers and tested on the ninth for both scenarios separately. Segmentation performance was assessed using the Dice similarity coefficient (DSC), while classification performance was assessed using metrics, such as precision, recall, F1-score, accuracy, area under the Receiver Operating Characteristic (ROC AUC), and area under the precision-recall curve (PRC AUC). ResUNet achieved DSC values of 0.84$\pm$0.03, 0.71$\pm$0.06, and 0.86$\pm$0.02 for MNG, TH, and DG, respectively. Classification in scenario 1 achieved an accuracy of 0.76$\pm$0.04 and a ROC AUC of 0.92$\pm$0.02 while in scenario 2, classification yielded an accuracy of 0.74$\pm$0.05 and a ROC AUC of 0.90$\pm$0.02. The automated pipeline demonstrated comparable performance to physician segmentations on several classification metrics across different classes, effectively reducing assessment time while maintaining high diagnostic accuracy. Code available at: https://github.com/ahxmeds/thyroidiomics.git.
Abstract:Optimal experimental design is a well studied field in applied science and engineering. Techniques for estimating such a design are commonly used within the framework of parameter estimation. Nonetheless, in recent years parameter estimation techniques are changing rapidly with the introduction of deep learning techniques to replace traditional estimation methods. This in turn requires the adaptation of optimal experimental design that is associated with these new techniques. In this paper we investigate a new experimental design methodology that uses deep learning. We show that the training of a network as a Likelihood Free Estimator can be used to significantly simplify the design process and circumvent the need for the computationally expensive bi-level optimization problem that is inherent in optimal experimental design for non-linear systems. Furthermore, deep design improves the quality of the recovery process for parameter estimation problems. As proof of concept we apply our methodology to two different systems of Ordinary Differential Equations.
Abstract:The 2nd SNMMI Artificial Intelligence (AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD, on February 29 - March 1, 2024. Bringing together various community members and stakeholders, and following up on a prior successful 2022 AI Summit, the summit theme was: AI in Action. Six key topics included (i) an overview of prior and ongoing efforts by the AI task force, (ii) emerging needs and tools for computational nuclear oncology, (iii) new frontiers in large language and generative models, (iv) defining the value proposition for the use of AI in nuclear medicine, (v) open science including efforts for data and model repositories, and (vi) issues of reimbursement and funding. The primary efforts, findings, challenges, and next steps are summarized in this manuscript.
Abstract:Dynamic Positron Emission Tomography (dPET) imaging and Time-Activity Curve (TAC) analyses are essential for understanding and quantifying the biodistribution of radiopharmaceuticals over time and space. Traditional compartmental modeling, while foundational, commonly struggles to fully capture the complexities of biological systems, including non-linear dynamics and variability. This study introduces an innovative data-driven neural network-based framework, inspired by Reaction Diffusion systems, designed to address these limitations. Our approach, which adaptively fits TACs from dPET, enables the direct calibration of diffusion coefficients and reaction terms from observed data, offering significant improvements in predictive accuracy and robustness over traditional methods, especially in complex biological scenarios. By more accurately modeling the spatio-temporal dynamics of radiopharmaceuticals, our method advances modeling of pharmacokinetic and pharmacodynamic processes, enabling new possibilities in quantitative nuclear medicine.
Abstract:We introduce an innovative, simple, effective segmentation-free approach for outcome prediction in head \& neck cancer (HNC) patients. By harnessing deep learning-based feature extraction techniques and multi-angle maximum intensity projections (MA-MIPs) applied to Fluorodeoxyglucose Positron Emission Tomography (FDG-PET) volumes, our proposed method eliminates the need for manual segmentations of regions-of-interest (ROIs) such as primary tumors and involved lymph nodes. Instead, a state-of-the-art object detection model is trained to perform automatic cropping of the head and neck region on the PET volumes. A pre-trained deep convolutional neural network backbone is then utilized to extract deep features from MA-MIPs obtained from 72 multi-angel axial rotations of the cropped PET volumes. These deep features extracted from multiple projection views of the PET volumes are then aggregated and fused, and employed to perform recurrence-free survival analysis on a cohort of 489 HNC patients. The proposed approach outperforms the best performing method on the target dataset for the task of recurrence-free survival analysis. By circumventing the manual delineation of the malignancies on the FDG PET-CT images, our approach eliminates the dependency on subjective interpretations and highly enhances the reproducibility of the proposed survival analysis method.
Abstract:Minimizing the need for pixel-level annotated data for training PET anomaly segmentation networks is crucial, particularly due to time and cost constraints related to expert annotations. Current un-/weakly-supervised anomaly detection methods rely on autoencoder or generative adversarial networks trained only on healthy data, although these are more challenging to train. In this work, we present a weakly supervised and Implicitly guided COuNterfactual diffusion model for Detecting Anomalies in PET images, branded as IgCONDA-PET. The training is conditioned on image class labels (healthy vs. unhealthy) along with implicit guidance to generate counterfactuals for an unhealthy image with anomalies. The counterfactual generation process synthesizes the healthy counterpart for a given unhealthy image, and the difference between the two facilitates the identification of anomaly locations. The code is available at: https://github.com/igcondapet/IgCONDA-PET.git
Abstract:The effectiveness of Deep Neural Networks (DNNs) heavily relies on the abundance and accuracy of available training data. However, collecting and annotating data on a large scale is often both costly and time-intensive, particularly in medical cases where practitioners are already occupied with their duties. Moreover, ensuring that the model remains robust across various scenarios of image capture is crucial in medical domains, especially when dealing with ultrasound images that vary based on the settings of different devices and the manual operation of the transducer. To address this challenge, we introduce a novel pipeline called MEDDAP, which leverages Stable Diffusion (SD) models to augment existing small datasets by automatically generating new informative labeled samples. Pretrained checkpoints for SD are typically based on natural images, and training them for medical images requires significant GPU resources due to their heavy parameters. To overcome this challenge, we introduce USLoRA (Ultrasound Low-Rank Adaptation), a novel fine-tuning method tailored specifically for ultrasound applications. USLoRA allows for selective fine-tuning of weights within SD, requiring fewer than 0.1\% of parameters compared to fully fine-tuning only the UNet portion of SD. To enhance dataset diversity, we incorporate different adjectives into the generation process prompts, thereby desensitizing the classifiers to intensity changes across different images. This approach is inspired by clinicians' decision-making processes regarding breast tumors, where tumor shape often plays a more crucial role than intensity. In conclusion, our pipeline not only outperforms classifiers trained on the original dataset but also demonstrates superior performance when encountering unseen datasets. The source code is available at https://github.com/yasamin-med/MEDDAP.