Abstract:Unsupervised Domain Adaptation (UDA) seeks to transfer knowledge from a labeled source domain to an unlabeled target domain but often suffers from severe domain and scale gaps that degrade performance. Existing cross-attention-based transformers can align features across domains, yet they struggle to preserve content semantics under large appearance and scale variations. To explicitly address these challenges, we introduce the concept of beneficial noise, which regularizes cross-attention by injecting controlled perturbations, encouraging the model to ignore style distractions and focus on content. We propose the Domain-Adaptive Cross-Scale Matching (DACSM) framework, which consists of a Domain-Adaptive Transformer (DAT) for disentangling domain-shared content from domain-specific style, and a Cross-Scale Matching (CSM) module that adaptively aligns features across multiple resolutions. DAT incorporates beneficial noise into cross-attention, enabling progressive domain translation with enhanced robustness, yielding content-consistent and style-invariant representations. Meanwhile, CSM ensures semantic consistency under scale changes. Extensive experiments on VisDA-2017, Office-Home, and DomainNet demonstrate that DACSM achieves state-of-the-art performance, with up to +2.3% improvement over CDTrans on VisDA-2017. Notably, DACSM achieves a +5.9% gain on the challenging "truck" class of VisDA, evidencing the strength of beneficial noise in handling scale discrepancies. These results highlight the effectiveness of combining domain translation, beneficial-noise-enhanced attention, and scale-aware alignment for robust cross-domain representation learning.
Abstract:Spatial transcriptomics (ST) technologies have revolutionized the study of gene expression patterns in tissues by providing multimodality data in transcriptomic, spatial, and morphological, offering opportunities for understanding tissue biology beyond transcriptomics. However, we identify the modality bias phenomenon in ST data species, i.e., the inconsistent contribution of different modalities to the labels leads to a tendency for the analysis methods to retain the information of the dominant modality. How to mitigate the adverse effects of modality bias to satisfy various downstream tasks remains a fundamental challenge. This paper introduces Multiple-modality Structure Transformation, named MuST, a novel methodology to tackle the challenge. MuST integrates the multi-modality information contained in the ST data effectively into a uniform latent space to provide a foundation for all the downstream tasks. It learns intrinsic local structures by topology discovery strategy and topology fusion loss function to solve the inconsistencies among different modalities. Thus, these topology-based and deep learning techniques provide a solid foundation for a variety of analytical tasks while coordinating different modalities. The effectiveness of MuST is assessed by performance metrics and biological significance. The results show that it outperforms existing state-of-the-art methods with clear advantages in the precision of identifying and preserving structures of tissues and biomarkers. MuST offers a versatile toolkit for the intricate analysis of complex biological systems.




Abstract:Dimension reduction (DR) is commonly utilized to capture the intrinsic structure and transform high-dimensional data into low-dimensional space while retaining meaningful properties of the original data. It is used in various applications, such as image recognition, single-cell sequencing analysis, and biomarker discovery. However, contemporary parametric-free and parametric DR techniques suffer from several significant shortcomings, such as the inability to preserve global and local features and the pool generalization performance. On the other hand, regarding explainability, it is crucial to comprehend the embedding process, especially the contribution of each part to the embedding process, while understanding how each feature affects the embedding results that identify critical components and help diagnose the embedding process. To address these problems, we have developed a deep neural network method called EVNet, which provides not only excellent performance in structural maintainability but also explainability to the DR therein. EVNet starts with data augmentation and a manifold-based loss function to improve embedding performance. The explanation is based on saliency maps and aims to examine the trained EVNet parameters and contributions of components during the embedding process. The proposed techniques are integrated with a visual interface to help the user to adjust EVNet to achieve better DR performance and explainability. The interactive visual interface makes it easier to illustrate the data features, compare different DR techniques, and investigate DR. An in-depth experimental comparison shows that EVNet consistently outperforms the state-of-the-art methods in both performance measures and explainability.