Integrating different molecular layers, i.e., multiomics data, is crucial for unraveling the complexity of diseases; yet, most deep generative models either prioritize predictive performance at the expense of interpretability or enforce interpretability by linearizing the decoder, thereby weakening the network's nonlinear expressiveness. To overcome this tradeoff, we introduce POEMS: Product Of Experts for Interpretable Multiomics Integration using Sparse Decoding, an unsupervised probabilistic framework that preserves predictive performance while providing interpretability. POEMS provides interpretability without linearizing any part of the network by 1) mapping features to latent factors using sparse connections, which directly translates to biomarker discovery, 2) allowing for cross-omic associations through a shared latent space using product of experts model, and 3) reporting contributions of each omic by a gating network that adaptively computes their influence in the representation learning. Additionally, we present an efficient sparse decoder. In a cancer subtyping case study, POEMS achieves competitive clustering and classification performance while offering our novel set of interpretations, demonstrating that biomarker based insight and predictive accuracy can coexist in multiomics representation learning.
The inherent multimodality and heterogeneous temporal structures of medical data pose significant challenges for modeling. We propose MedM2T, a time-aware multimodal framework designed to address these complexities. MedM2T integrates: (i) Sparse Time Series Encoder to flexibly handle irregular and sparse time series, (ii) Hierarchical Time-Aware Fusion to capture both micro- and macro-temporal patterns from multiple dense time series, such as ECGs, and (iii) Bi-Modal Attention to extract cross-modal interactions, which can be extended to any number of modalities. To mitigate granularity gaps between modalities, MedM2T uses modality-specific pre-trained encoders and aligns resulting features within a shared encoder. We evaluated MedM2T on MIMIC-IV and MIMIC-IV-ECG datasets for three tasks that encompass chronic and acute disease dynamics: 90-day cardiovascular disease (CVD) prediction, in-hospital mortality prediction, and ICU length-of-stay (LOS) regression. MedM2T outperformed state-of-the-art multimodal learning frameworks and existing time series models, achieving an AUROC of 0.947 and an AUPRC of 0.706 for CVD prediction; an AUROC of 0.901 and an AUPRC of 0.558 for mortality prediction; and Mean Absolute Error (MAE) of 2.31 for LOS regression. These results highlight the robustness and broad applicability of MedM2T, positioning it as a promising tool in clinical prediction. We provide the implementation of MedM2T at https://github.com/DHLab-TSENG/MedM2T.
A vast majority of mass spectrometry data remains uncharacterized, leaving much of its biological and chemical information untapped. Recent advances in machine learning have begun to address this gap, particularly for tasks such as spectral identification in tandem mass spectrometry data. Here, we present the latest generation of LSM-MS2, a large-scale deep learning foundation model trained on millions of spectra to learn a semantic chemical space. LSM-MS2 achieves state-of-the-art performance in spectral identification, improving on existing methods by 30% in accuracy of identifying challenging isomeric compounds, yielding 42% more correct identifications in complex biological samples, and maintaining robustness under low-concentration conditions. Furthermore, LSM-MS2 produces rich spectral embeddings that enable direct biological interpretation from minimal downstream data, successfully differentiating disease states and predicting clinical outcomes across diverse translational applications.
Drug recommendation (DR) systems aim to support healthcare professionals in selecting appropriate medications based on patients' medical conditions. State-of-the-art approaches utilize deep learning techniques for improving DR, but fall short in providing any insights on the derivation process of recommendations -- a critical limitation in such high-stake applications. We propose TraceDR, a novel DR system operating over a medical knowledge graph (MKG), which ensures access to large-scale and high-quality information. TraceDR simultaneously predicts drug recommendations and related evidence within a multi-task learning framework, enabling traceability of medication recommendations. For covering a more diverse set of diseases and drugs than existing works, we devise a framework for automatically constructing patient health records and release DrugRec, a new large-scale testbed for DR.
Personalized treatment outcome prediction based on trial data for small-sample and rare patient groups is critical in precision medicine. However, the costly trial data limit the prediction performance. To address this issue, we propose a cross-fidelity knowledge distillation and adaptive fusion network (CFKD-AFN), which leverages abundant but low-fidelity simulation data to enhance predictions on scarce but high-fidelity trial data. CFKD-AFN incorporates a dual-channel knowledge distillation module to extract complementary knowledge from the low-fidelity model, along with an attention-guided fusion module to dynamically integrate multi-source information. Experiments on treatment outcome prediction for the chronic obstructive pulmonary disease demonstrates significant improvements of CFKD-AFN over state-of-the-art methods in prediction accuracy, ranging from 6.67\% to 74.55\%, and strong robustness to varying high-fidelity dataset sizes. Furthermore, we extend CFKD-AFN to an interpretable variant, enabling the exploration of latent medical semantics to support clinical decision-making.
Continuous electrocardiogram (ECG) monitoring via wearables offers significant potential for early cardiovascular disease (CVD) detection. However, deploying deep learning models for automated analysis in resource-constrained environments faces reliability challenges due to inevitable Out-of-Distribution (OOD) data. OOD inputs, such as unseen pathologies or noisecorrupted signals, often cause erroneous, high-confidence predictions by standard classifiers, compromising patient safety. Existing OOD detection methods either neglect computational constraints or address noise and unseen classes separately. This paper explores Unsupervised Anomaly Detection (UAD) as an independent, upstream filtering mechanism to improve robustness. We benchmark six UAD approaches, including Deep SVDD, reconstruction-based models, Masked Anomaly Detection, normalizing flows, and diffusion models, optimized via Neural Architecture Search (NAS) under strict resource constraints (at most 512k parameters). Evaluation on PTB-XL and BUT QDB datasets assessed detection of OOD CVD classes and signals unsuitable for analysis due to noise. Results show Deep SVDD consistently achieves the best trade-off between detection and efficiency. In a realistic deployment simulation, integrating the optimized Deep SVDD filter with a diagnostic classifier improved accuracy by up to 21 percentage points over a classifier-only baseline. This study demonstrates that optimized UAD filters can safeguard automated ECG analysis, enabling safer, more reliable continuous cardiovascular monitoring on wearables.
Background: Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD) affects ~33% of U.S. adults and is the most common chronic liver disease. Although often asymptomatic, progression can lead to cirrhosis. Early detection is important, as lifestyle interventions can prevent disease progression. We developed a fair, rigorous, and reproducible MASLD prediction model and compared it to prior methods using a large electronic health record database. Methods: We evaluated LASSO logistic regression, random forest, XGBoost, and a neural network for MASLD prediction using clinical feature subsets, including the top 10 SHAP-ranked features. To reduce disparities in true positive rates across racial and ethnic subgroups, we applied an equal opportunity postprocessing method. Results: This study included 59,492 patients in the training data, 24,198 in the validating data, and 25,188 in the testing data. The LASSO logistic regression model with the top 10 features was selected for its interpretability and comparable performance. Before fairness adjustment, the model achieved AUROC of 0.84, accuracy of 78%, sensitivity of 72%, specificity of 79%, and F1-score of 0.617. After equal opportunity postprocessing, accuracy modestly increased to 81% and specificity to 94%, while sensitivity decreased to 41% and F1-score to 0.515, reflecting the fairness trade-off. Conclusions: We developed the MASER prediction model (MASLD Static EHR Risk Prediction), a LASSO logistic regression model which achieved competitive performance for MASLD prediction (AUROC 0.836, accuracy 77.6%), comparable to previously reported ensemble and tree-based models. Overall, this approach demonstrates that interpretable models can achieve a balance of predictive performance and fairness in diverse patient populations.
Image-to-image translation models have achieved notable success in converting images across visual domains and are increasingly used for medical tasks such as predicting post-operative outcomes and modeling disease progression. However, most existing methods primarily aim to match the target distribution and often neglect spatial correspondences between the source and translated images. This limitation can lead to structural inconsistencies and hallucinations, undermining the reliability and interpretability of the predictions. These challenges are accentuated in clinical applications by the stringent requirement for anatomical accuracy. In this work, we present TraceTrans, a novel deformable image translation model designed for post-operative prediction that generates images aligned with the target distribution while explicitly revealing spatial correspondences with the pre-operative input. The framework employs an encoder for feature extraction and dual decoders for predicting spatial deformations and synthesizing the translated image. The predicted deformation field imposes spatial constraints on the generated output, ensuring anatomical consistency with the source. Extensive experiments on medical cosmetology and brain MRI datasets demonstrate that TraceTrans delivers accurate and interpretable post-operative predictions, highlighting its potential for reliable clinical deployment.
Accurate symptom-to-disease classification and clinically grounded treatment recommendations remain challenging, particularly in heterogeneous patient settings with high diagnostic risk. Existing large language model (LLM)-based systems often lack medical grounding and fail to quantify uncertainty, resulting in unsafe outputs. We propose CLIN-LLM, a safety-constrained hybrid pipeline that integrates multimodal patient encoding, uncertainty-calibrated disease classification, and retrieval-augmented treatment generation. The framework fine-tunes BioBERT on 1,200 clinical cases from the Symptom2Disease dataset and incorporates Focal Loss with Monte Carlo Dropout to enable confidence-aware predictions from free-text symptoms and structured vitals. Low-certainty cases (18%) are automatically flagged for expert review, ensuring human oversight. For treatment generation, CLIN-LLM employs Biomedical Sentence-BERT to retrieve top-k relevant dialogues from the 260,000-sample MedDialog corpus. The retrieved evidence and patient context are fed into a fine-tuned FLAN-T5 model for personalized treatment generation, followed by post-processing with RxNorm for antibiotic stewardship and drug-drug interaction (DDI) screening. CLIN-LLM achieves 98% accuracy and F1 score, outperforming ClinicalBERT by 7.1% (p < 0.001), with 78% top-5 retrieval precision and a clinician-rated validity of 4.2 out of 5. Unsafe antibiotic suggestions are reduced by 67% compared to GPT-5. These results demonstrate CLIN-LLM's robustness, interpretability, and clinical safety alignment. The proposed system provides a deployable, human-in-the-loop decision support framework for resource-limited healthcare environments. Future work includes integrating imaging and lab data, multilingual extensions, and clinical trial validation.
Electrocardiogram (ECG) interpretation is essential for cardiovascular disease diagnosis, but current automated systems often struggle with transparency and generalization to unseen conditions. To address this, we introduce ZETA, a zero-shot multimodal framework designed for interpretable ECG diagnosis aligned with clinical workflows. ZETA uniquely compares ECG signals against structured positive and negative clinical observations, which are curated through an LLM-assisted, expert-validated process, thereby mimicking differential diagnosis. Our approach leverages a pre-trained multimodal model to align ECG and text embeddings without disease-specific fine-tuning. Empirical evaluations demonstrate ZETA's competitive zero-shot classification performance and, importantly, provide qualitative and quantitative evidence of enhanced interpretability, grounding predictions in specific, clinically relevant positive and negative diagnostic features. ZETA underscores the potential of aligning ECG analysis with structured clinical knowledge for building more transparent, generalizable, and trustworthy AI diagnostic systems. We will release the curated observation dataset and code to facilitate future research.