Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dongting Xie

A Comprehensive Survey on Imbalanced Data Learning

Feb 13, 2025

Xinyi Gao, Dongting Xie, Yihang Zhang, Zhengren Wang, Conghui He, Hongzhi Yin, Wentao Zhang

Abstract:With the expansion of data availability, machine learning (ML) has achieved remarkable breakthroughs in both academia and industry. However, imbalanced data distributions are prevalent in various types of raw data and severely hinder the performance of ML by biasing the decision-making processes. To deepen the understanding of imbalanced data and facilitate the related research and applications, this survey systematically analyzing various real-world data formats and concludes existing researches for different data formats into four distinct categories: data re-balancing, feature representation, training strategy, and ensemble learning. This structured analysis help researchers comprehensively understand the pervasive nature of imbalance across diverse data format, thereby paving a clearer path toward achieving specific research goals. we provide an overview of relevant open-source libraries, spotlight current challenges, and offer novel insights aimed at fostering future advancements in this critical area of study.

Via

Access Paper or Ask Questions

Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data

Aug 09, 2024

Donghai Fang, Fangfang Zhu, Dongting Xie, Wenwen Min

Figure 1 for Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data

Figure 2 for Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data

Figure 3 for Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data

Figure 4 for Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data

Abstract:With the rapid advancement of Spatial Resolved Transcriptomics (SRT) technology, it is now possible to comprehensively measure gene transcription while preserving the spatial context of tissues. Spatial domain identification and gene denoising are key objectives in SRT data analysis. We propose a Contrastively Augmented Masked Graph Autoencoder (STMGAC) to learn low-dimensional latent representations for domain identification. In the latent space, persistent signals for representations are obtained through self-distillation to guide self-supervised matching. At the same time, positive and negative anchor pairs are constructed using triplet learning to augment the discriminative ability. We evaluated the performance of STMGAC on five datasets, achieving results superior to those of existing baseline methods. All code and public datasets used in this paper are available at https://github.com/wenwenmin/STMGAC and https://zenodo.org/records/13253801.

Via

Access Paper or Ask Questions