Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenwen Min

GraphMMP: A Graph Neural Network Model with Mutual Information and Global Fusion for Multimodal Medical Prognosis

Aug 24, 2025

Xuhao Shan, Ruiquan Ge, Jikui Liu, Linglong Wu, Chi Zhang, Siqi Liu, Wenjian Qin, Wenwen Min, Ahmed Elazab, Changmiao Wang

Abstract:In the field of multimodal medical data analysis, leveraging diverse types of data and understanding their hidden relationships continues to be a research focus. The main challenges lie in effectively modeling the complex interactions between heterogeneous data modalities with distinct characteristics while capturing both local and global dependencies across modalities. To address these challenges, this paper presents a two-stage multimodal prognosis model, GraphMMP, which is based on graph neural networks. The proposed model constructs feature graphs using mutual information and features a global fusion module built on Mamba, which significantly boosts prognosis performance. Empirical results show that GraphMMP surpasses existing methods on datasets related to liver prognosis and the METABRIC study, demonstrating its effectiveness in multimodal medical prognosis tasks.

Via

Access Paper or Ask Questions

ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction

Jan 20, 2025

Xiangyang Hu, Xiangyu Shen, Yifei Sun, Xuhao Shan, Wenwen Min, Liyilei Su, Xiaomao Fan, Ahmed Elazab, Ruiquan Ge, Changmiao Wang(+1 more)

Figure 1 for ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction

Figure 2 for ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction

Figure 3 for ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction

Abstract:Alzheimer's disease (AD) is a common neurodegenerative disease among the elderly. Early prediction and timely intervention of its prodromal stage, mild cognitive impairment (MCI), can decrease the risk of advancing to AD. Combining information from various modalities can significantly improve predictive accuracy. However, challenges such as missing data and heterogeneity across modalities complicate multimodal learning methods as adding more modalities can worsen these issues. Current multimodal fusion techniques often fail to adapt to the complexity of medical data, hindering the ability to identify relationships between modalities. To address these challenges, we propose an innovative multimodal approach for predicting MCI conversion, focusing specifically on the issues of missing positron emission tomography (PET) data and integrating diverse medical information. The proposed incomplete triple-modal MCI conversion prediction network is tailored for this purpose. Through the missing modal generation module, we synthesize the missing PET data from the magnetic resonance imaging and extract features using specifically designed encoders. We also develop a channel aggregation module and a triple-modal co-attention fusion module to reduce feature redundancy and achieve effective multimodal data fusion. Furthermore, we design a loss function to handle missing modality issues and align cross-modal features. These components collectively harness multimodal data to boost network performance. Experimental results on the ADNI1 and ADNI2 datasets show that our method significantly surpasses existing unimodal and other multimodal models. Our code is available at https://github.com/justinhxy/ITFC.

* 5 pages, 1 figure, accepted by IEEE ISBI 2025

Via

Access Paper or Ask Questions

BS-LDM: Effective Bone Suppression in High-Resolution Chest X-Ray Images with Conditional Latent Diffusion Models

Dec 24, 2024

Yifei Sun, Zhanghao Chen, Hao Zheng, Ruiquan Ge, Jin Liu, Wenwen Min, Ahmed Elazab, Xiang Wan, Changmiao Wang

Abstract:The interference of overlapping bones and pulmonary structures can reduce the effectiveness of Chest X-ray (CXR) examinations. Bone suppression techniques have been developed to improve diagnostic accuracy. Dual-energy subtraction (DES) imaging, a common method for bone suppression, is costly and exposes patients to higher radiation levels. Deep learning-based image generation methods have been proposed as alternatives, however, they often fail to produce high-quality and high-resolution images, resulting in the loss of critical lesion information and texture details. To address these issues, in this paper, we introduce an end-to-end framework for bone suppression in high-resolution CXR images, termed BS-LDM. This framework employs a conditional latent diffusion model to generate high-resolution soft tissue images with fine detail and critical lung pathology by performing bone suppression in the latent space. We implement offset noise during the noise addition phase of the training process to better render low-frequency information in soft tissue images. Additionally, we introduce a dynamic clipping strategy during the sampling process to refine pixel intensity in the generated soft tissue images. We compiled a substantial and high-quality bone suppression dataset, SZCH-X-Rays, including high-resolution paired CXR and DES soft tissue images from 818 patients, collected from our partner hospitals. Moreover, we pre-processed 241 pairs of CXR and DES soft tissue images from the JSRT dataset, the largest publicly available dataset. Comprehensive experimental and clinical evaluations demonstrate that BS-LDM exhibits superior bone suppression capabilities, highlighting its significant clinical potential.

* 9 pages, 6 figures

Via

Access Paper or Ask Questions

Precise Facial Landmark Detection by Dynamic Semantic Aggregation Transformer

Dec 01, 2024

Jun Wan, He Liu, Yujia Wu, Zhihui Lai, Wenwen Min, Jun Liu

Figure 1 for Precise Facial Landmark Detection by Dynamic Semantic Aggregation Transformer

Figure 2 for Precise Facial Landmark Detection by Dynamic Semantic Aggregation Transformer

Figure 3 for Precise Facial Landmark Detection by Dynamic Semantic Aggregation Transformer

Figure 4 for Precise Facial Landmark Detection by Dynamic Semantic Aggregation Transformer

Abstract:At present, deep neural network methods have played a dominant role in face alignment field. However, they generally use predefined network structures to predict landmarks, which tends to learn general features and leads to mediocre performance, e.g., they perform well on neutral samples but struggle with faces exhibiting large poses or occlusions. Moreover, they cannot effectively deal with semantic gaps and ambiguities among features at different scales, which may hinder them from learning efficient features. To address the above issues, in this paper, we propose a Dynamic Semantic-Aggregation Transformer (DSAT) for more discriminative and representative feature (i.e., specialized feature) learning. Specifically, a Dynamic Semantic-Aware (DSA) model is first proposed to partition samples into subsets and activate the specific pathways for them by estimating the semantic correlations of feature channels, making it possible to learn specialized features from each subset. Then, a novel Dynamic Semantic Specialization (DSS) model is designed to mine the homogeneous information from features at different scales for eliminating the semantic gap and ambiguities and enhancing the representation ability. Finally, by integrating the DSA model and DSS model into our proposed DSAT in both dynamic architecture and dynamic parameter manners, more specialized features can be learned for achieving more precise face alignment. It is interesting to show that harder samples can be handled by activating more feature channels. Extensive experiments on popular face alignment datasets demonstrate that our proposed DSAT outperforms state-of-the-art models in the literature.Our code is available at https://github.com/GERMINO-LiuHe/DSAT.

Via

Access Paper or Ask Questions

Pretrained-Guided Conditional Diffusion Models for Microbiome Data Analysis

Aug 10, 2024

Xinyuan Shi, Fangfang Zhu, Wenwen Min

Abstract:Emerging evidence indicates that human cancers are intricately linked to human microbiomes, forming an inseparable connection. However, due to limited sample sizes and significant data loss during collection for various reasons, some machine learning methods have been proposed to address the issue of missing data. These methods have not fully utilized the known clinical information of patients to enhance the accuracy of data imputation. Therefore, we introduce mbVDiT, a novel pre-trained conditional diffusion model for microbiome data imputation and denoising, which uses the unmasked data and patient metadata as conditional guidance for imputating missing values. It is also uses VAE to integrate the the other public microbiome datasets to enhance model performance. The results on the microbiome datasets from three different cancer types demonstrate the performance of our methods in comparison with existing methods.

Via

Access Paper or Ask Questions

Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data

Aug 09, 2024

Donghai Fang, Fangfang Zhu, Dongting Xie, Wenwen Min

Figure 1 for Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data

Figure 2 for Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data

Figure 3 for Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data

Figure 4 for Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data

Abstract:With the rapid advancement of Spatial Resolved Transcriptomics (SRT) technology, it is now possible to comprehensively measure gene transcription while preserving the spatial context of tissues. Spatial domain identification and gene denoising are key objectives in SRT data analysis. We propose a Contrastively Augmented Masked Graph Autoencoder (STMGAC) to learn low-dimensional latent representations for domain identification. In the latent space, persistent signals for representations are obtained through self-distillation to guide self-supervised matching. At the same time, positive and negative anchor pairs are constructed using triplet learning to augment the discriminative ability. We evaluated the performance of STMGAC on five datasets, achieving results superior to those of existing baseline methods. All code and public datasets used in this paper are available at https://github.com/wenwenmin/STMGAC and https://zenodo.org/records/13253801.

Via

Access Paper or Ask Questions

Masked adversarial neural network for cell type deconvolution in spatial transcriptomics

Aug 09, 2024

Lin Huang, Xiaofei Liu, Shunfang Wang, Wenwen Min

Figure 1 for Masked adversarial neural network for cell type deconvolution in spatial transcriptomics

Figure 2 for Masked adversarial neural network for cell type deconvolution in spatial transcriptomics

Figure 3 for Masked adversarial neural network for cell type deconvolution in spatial transcriptomics

Figure 4 for Masked adversarial neural network for cell type deconvolution in spatial transcriptomics

Abstract:Accurately determining cell type composition in disease-relevant tissues is crucial for identifying disease targets. Most existing spatial transcriptomics (ST) technologies cannot achieve single-cell resolution, making it challenging to accurately determine cell types. To address this issue, various deconvolution methods have been developed. Most of these methods use single-cell RNA sequencing (scRNA-seq) data from the same tissue as a reference to infer cell types in ST data spots. However, they often overlook the differences between scRNA-seq and ST data. To overcome this limitation, we propose a Masked Adversarial Neural Network (MACD). MACD employs adversarial learning to align real ST data with simulated ST data generated from scRNA-seq data. By mapping them into a unified latent space, it can minimize the differences between the two types of data. Additionally, MACD uses masking techniques to effectively learn the features of real ST data and mitigate noise. We evaluated MACD on 32 simulated datasets and 2 real datasets, demonstrating its accuracy in performing cell type deconvolution. All code and public datasets used in this paper are available at https://github.com/wenwenmin/MACD and https://zenodo.org/records/12804822.

Via

Access Paper or Ask Questions

scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data

Aug 09, 2024

Wenwen Min, Zhen Wang, Fangfang Zhu, Taosheng Xu, Shunfang Wang

Figure 1 for scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data

Figure 2 for scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data

Figure 3 for scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data

Figure 4 for scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data

Abstract:Single-cell RNA sequencing (scRNA-seq) data analysis is pivotal for understanding cellular heterogeneity. However, the high sparsity and complex noise patterns inherent in scRNA-seq data present significant challenges for traditional clustering methods. To address these issues, we propose a deep clustering method, Attention-Enhanced Structural Deep Embedding Graph Clustering (scASDC), which integrates multiple advanced modules to improve clustering accuracy and robustness.Our approach employs a multi-layer graph convolutional network (GCN) to capture high-order structural relationships between cells, termed as the graph autoencoder module. To mitigate the oversmoothing issue in GCNs, we introduce a ZINB-based autoencoder module that extracts content information from the data and learns latent representations of gene expression. These modules are further integrated through an attention fusion mechanism, ensuring effective combination of gene expression and structural information at each layer of the GCN. Additionally, a self-supervised learning module is incorporated to enhance the robustness of the learned embeddings. Extensive experiments demonstrate that scASDC outperforms existing state-of-the-art methods, providing a robust and effective solution for single-cell clustering tasks. Our method paves the way for more accurate and meaningful analysis of single-cell RNA sequencing data, contributing to better understanding of cellular heterogeneity and biological processes. All code and public datasets used in this paper are available at \url{https://github.com/wenwenmin/scASDC} and \url{https://zenodo.org/records/12814320}.

Via

Access Paper or Ask Questions

High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

Jul 30, 2024

Zhiceng Shi, Shuailin Xue, Fangfang Zhu, Wenwen Min

Figure 1 for High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

Figure 2 for High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

Figure 3 for High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

Figure 4 for High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

Abstract:Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, existing methods struggle to capture rich image features effectively or rely on low-dimensional positional coordinates, making it difficult to accurately predict high-resolution gene expression profiles. To address these limitations, we developed HisToSGE, a method that employs a Pathology Image Large Model (PILM) to extract rich image features from histological images and utilizes a feature learning module to robustly generate high-resolution gene expression profiles. We evaluated HisToSGE on four ST datasets, comparing its performance with five state-of-the-art baseline methods. The results demonstrate that HisToSGE excels in generating high-resolution gene expression profiles and performing downstream tasks such as spatial domain identification. All code and public datasets used in this paper are available at https://github.com/wenwenmin/HisToSGE and https://zenodo.org/records/12792163.

Via

Access Paper or Ask Questions

SpaDiT: Diffusion Transformer for Spatial Gene Expression Prediction using scRNA-seq

Jul 18, 2024

Xiaoyu Li, Fangfang Zhu, Wenwen Min

Abstract:The rapid development of spatial transcriptomics (ST) technologies is revolutionizing our understanding of the spatial organization of biological tissues. Current ST methods, categorized into next-generation sequencing-based (seq-based) and fluorescence in situ hybridization-based (image-based) methods, offer innovative insights into the functional dynamics of biological tissues. However, these methods are limited by their cellular resolution and the quantity of genes they can detect. To address these limitations, we propose SpaDiT, a deep learning method that utilizes a diffusion generative model to integrate scRNA-seq and ST data for the prediction of undetected genes. By employing a Transformer-based diffusion model, SpaDiT not only accurately predicts unknown genes but also effectively generates the spatial structure of ST genes. We have demonstrated the effectiveness of SpaDiT through extensive experiments on both seq-based and image-based ST data. SpaDiT significantly contributes to ST gene prediction methods with its innovative approach. Compared to eight leading baseline methods, SpaDiT achieved state-of-the-art performance across multiple metrics, highlighting its substantial bioinformatics contribution.

Via

Access Paper or Ask Questions