Abstract:Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer, with most cases diagnosed at stage IV and a five-year overall survival rate below 5%. Early detection and prognosis modeling are crucial for improving patient outcomes and guiding early intervention strategies. In this study, we developed and evaluated a deep learning fusion model that integrates radiology reports and CT imaging to predict PDAC risk. The model achieved a concordance index (C-index) of 0.6750 (95% CI: 0.6429, 0.7121) and 0.6435 (95% CI: 0.6055, 0.6789) on the internal and external dataset, respectively, for 5-year survival risk estimation. Kaplan-Meier analysis demonstrated significant separation (p<0.0001) between the low and high risk groups predicted by the fusion model. These findings highlight the potential of deep learning-based survival models in leveraging clinical and imaging data for pancreatic cancer.
Abstract:Women are underdiagnosed and undertreated for cardiovascular disease. Automatic quantification of breast arterial calcification on screening mammography can identify women at risk for cardiovascular disease and enable earlier treatment and management of disease. In this retrospective study of 116,135 women from two healthcare systems, a transformer-based neural network quantified BAC severity (no BAC, mild, moderate, and severe) on screening mammograms. Outcomes included major adverse cardiovascular events (MACE) and all-cause mortality. BAC severity was independently associated with MACE after adjusting for cardiovascular risk factors, with increasing hazard ratios from mild (HR 1.18-1.22), moderate (HR 1.38-1.47), to severe BAC (HR 2.03-2.22) across datasets (all p<0.001). This association remained significant across all age groups, with even mild BAC indicating increased risk in women under 50. BAC remained an independent predictor when analyzed alongside ASCVD risk scores, showing significant associations with myocardial infarction, stroke, heart failure, and mortality (all p<0.005). Automated BAC quantification enables opportunistic cardiovascular risk assessment during routine mammography without additional radiation or cost. This approach provides value beyond traditional risk factors, particularly in younger women, offering potential for early CVD risk stratification in the millions of women undergoing annual mammography.
Abstract:Out-of-distribution (OOD) detection is crucial for enhancing the generalization of AI models used in mammogram screening. Given the challenge of limited prior knowledge about OOD samples in external datasets, unsupervised generative learning is a preferable solution which trains the model to discern the normal characteristics of in-distribution (ID) data. The hypothesis is that during inference, the model aims to reconstruct ID samples accurately, while OOD samples exhibit poorer reconstruction due to their divergence from normality. Inspired by state-of-the-art (SOTA) hybrid architectures combining CNNs and transformers, we developed a novel backbone - HAND, for detecting OOD from large-scale digital screening mammogram studies. To boost the learning efficiency, we incorporated synthetic OOD samples and a parallel discriminator in the latent space to distinguish between ID and OOD samples. Gradient reversal to the OOD reconstruction loss penalizes the model for learning OOD reconstructions. An anomaly score is computed by weighting the reconstruction and discriminator loss. On internal RSNA mammogram held-out test and external Mayo clinic hand-curated dataset, the proposed HAND model outperformed encoder-based and GAN-based baselines, and interestingly, it also outperformed the hybrid CNN+transformer baselines. Therefore, the proposed HAND pipeline offers an automated efficient computational solution for domain-specific quality checks in external screening mammograms, yielding actionable insights without direct exposure to the private medical imaging data.
Abstract:A visual-language model (VLM) pre-trained on natural images and text pairs poses a significant barrier when applied to medical contexts due to domain shift. Yet, adapting or fine-tuning these VLMs for medical use presents considerable hurdles, including domain misalignment, limited access to extensive datasets, and high-class imbalances. Hence, there is a pressing need for strategies to effectively adapt these VLMs to the medical domain, as such adaptations would prove immensely valuable in healthcare applications. In this study, we propose a framework designed to adeptly tailor VLMs to the medical domain, employing selective sampling and hard-negative mining techniques for enhanced performance in retrieval tasks. We validate the efficacy of our proposed approach by implementing it across two distinct VLMs: the in-domain VLM (MedCLIP) and out-of-domain VLMs (ALBEF). We assess the performance of these models both in their original off-the-shelf state and after undergoing our proposed training strategies, using two extensive datasets containing mammograms and their corresponding reports. Our evaluation spans zero-shot, few-shot, and supervised scenarios. Through our approach, we observe a notable enhancement in Recall@K performance for the image-text retrieval task.
Abstract:The integration of Large Language Models (LLMs) into the healthcare domain has the potential to significantly enhance patient care and support through the development of empathetic, patient-facing chatbots. This study investigates an intriguing question Can ChatGPT respond with a greater degree of empathy than those typically offered by physicians? To answer this question, we collect a de-identified dataset of patient messages and physician responses from Mayo Clinic and generate alternative replies using ChatGPT. Our analyses incorporate novel empathy ranking evaluation (EMRank) involving both automated metrics and human assessments to gauge the empathy level of responses. Our findings indicate that LLM-powered chatbots have the potential to surpass human physicians in delivering empathetic communication, suggesting a promising avenue for enhancing patient care and reducing professional burnout. The study not only highlights the importance of empathy in patient interactions but also proposes a set of effective automatic empathy ranking metrics, paving the way for the broader adoption of LLMs in healthcare.
Abstract:Purpose: To analyze the demographic and imaging characteristics associated with increased risk of failure for abnormality classification in screening mammograms. Materials and Methods: This retrospective study used data from the Emory BrEast Imaging Dataset (EMBED) which includes mammograms from 115,931 patients imaged at Emory University Healthcare between 2013 to 2020. Clinical and imaging data includes Breast Imaging Reporting and Data System (BI-RADS) assessment, region of interest coordinates for abnormalities, imaging features, pathologic outcomes, and patient demographics. Multiple deep learning models were developed to distinguish between patches of abnormal tissue and randomly selected patches of normal tissue from the screening mammograms. We assessed model performance overall and within subgroups defined by age, race, pathologic outcome, and imaging characteristics to evaluate reasons for misclassifications. Results: On a test set size of 5,810 studies (13,390 patches), a ResNet152V2 model trained to classify normal versus abnormal tissue patches achieved an accuracy of 92.6% (95% CI = 92.0-93.2%), and area under the receiver operative characteristics curve 0.975 (95% CI = 0.972-0.978). Imaging characteristics associated with higher misclassifications of images include higher tissue densities (risk ratio [RR]=1.649; p=.010, BI-RADS density C and RR=2.026; p=.003, BI-RADS density D), and presence of architectural distortion (RR=1.026; p<.001). Conclusion: Even though deep learning models for abnormality classification can perform well in screening mammography, we demonstrate certain imaging features that result in worse model performance. This is the first such work to systematically evaluate breast abnormality classification by various subgroups and better-informed developers and end-users of population subgroups which are likely to experience biased model performance.
Abstract:Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text classification system for automatically detecting self-reported migraine-related posts, and (iii) conduct analyses of the self-reported posts to assess the utility of social media for studying this problem. We manually annotated 5750 Twitter posts and 302 Reddit posts. Our system achieved an F1 score of 0.90 on Twitter and 0.93 on Reddit. Analysis of information posted by our 'migraine cohort' revealed the presence of a plethora of relevant information about migraine therapies and patient sentiments associated with them. Our study forms the foundation for conducting an in-depth analysis of migraine-related information using social media data.
Abstract:Controlled Markov chains (CMCs) form the bedrock for model-based reinforcement learning. In this work, we consider the estimation of the transition probability matrices of a finite-state finite-control CMC using a fixed dataset, collected using a so-called logging policy, and develop minimax sample complexity bounds for nonparametric estimation of these transition probability matrices. Our results are general, and the statistical bounds depend on the logging policy through a natural mixing coefficient. We demonstrate an interesting trade-off between stronger assumptions on mixing versus requiring more samples to achieve a particular PAC-bound. We demonstrate the validity of our results under various examples, such as ergodic Markov chains, weakly ergodic inhomogeneous Markov chains, and controlled Markov chains with non-stationary Markov, episodic, and greedy controls. Lastly, we use these sample complexity bounds to establish concomitant ones for offline evaluation of stationary, Markov policies.
Abstract:We study the meta-learning for support (i.e. the set of non-zero entries) recovery in high-dimensional Principal Component Analysis. We reduce the sufficient sample complexity in a novel task with the information that is learned from auxiliary tasks. We assume each task to be a different random Principal Component (PC) matrix with a possibly different support and that the support union of the PC matrices is small. We then pool the data from all the tasks to execute an improper estimation of a single PC matrix by maximising the $l_1$-regularised predictive covariance to establish that with high probability the true support union can be recovered provided a sufficient number of tasks $m$ and a sufficient number of samples $ O\left(\frac{\log(p)}{m}\right)$ for each task, for $p$-dimensional vectors. Then, for a novel task, we prove that the maximisation of the $l_1$-regularised predictive covariance with the additional constraint that the support is a subset of the estimated support union could reduce the sufficient sample complexity of successful support recovery to $O(\log |J|)$, where $J$ is the support union recovered from the auxiliary tasks. Typically, $|J|$ would be much less than $p$ for sparse matrices. Finally, we demonstrate the validity of our experiments through numerical simulations.
Abstract:Language modality within the vision language pretraining framework is innately discretized, endowing each word in the language vocabulary a semantic meaning. In contrast, visual modality is inherently continuous and high-dimensional, which potentially prohibits the alignment as well as fusion between vision and language modalities. We therefore propose to "discretize" the visual representation by joint learning a codebook that imbues each visual token a semantic. We then utilize these discretized visual semantics as self-supervised ground-truths for building our Masked Image Modeling objective, a counterpart of Masked Language Modeling which proves successful for language models. To optimize the codebook, we extend the formulation of VQ-VAE which gives a theoretic guarantee. Experiments validate the effectiveness of our approach across common vision-language benchmarks.