Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Valentina Boeva

scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data

Jun 10, 2025

Olga Ovcharenko, Florian Barkmann, Philip Toma, Imant Daunhawer, Julia Vogt, Sebastian Schelter, Valentina Boeva

Abstract:Self-supervised learning (SSL) has proven to be a powerful approach for extracting biologically meaningful representations from single-cell data. To advance our understanding of SSL methods applied to single-cell data, we present scSSL-Bench, a comprehensive benchmark that evaluates nineteen SSL methods. Our evaluation spans nine datasets and focuses on three common downstream tasks: batch correction, cell type annotation, and missing modality prediction. Furthermore, we systematically assess various data augmentation strategies. Our analysis reveals task-specific trade-offs: the specialized single-cell frameworks, scVI, CLAIRE, and the finetuned scGPT excel at uni-modal batch correction, while generic SSL methods, such as VICReg and SimCLR, demonstrate superior performance in cell typing and multi-modal data integration. Random masking emerges as the most effective augmentation technique across all tasks, surpassing domain-specific augmentations. Notably, our results indicate the need for a specialized single-cell multi-modal data integration framework. scSSL-Bench provides a standardized evaluation platform and concrete recommendations for applying SSL to single-cell analysis, advancing the convergence of deep learning and single-cell genomics.

* Accepted at ICML 2025 (Spotlight)

Via

Access Paper or Ask Questions

Feature Clock: High-Dimensional Effects in Two-Dimensional Plots

Aug 06, 2024

Olga Ovcharenko, Rita Sevastjanova, Valentina Boeva

Figure 1 for Feature Clock: High-Dimensional Effects in Two-Dimensional Plots

Figure 2 for Feature Clock: High-Dimensional Effects in Two-Dimensional Plots

Figure 3 for Feature Clock: High-Dimensional Effects in Two-Dimensional Plots

Figure 4 for Feature Clock: High-Dimensional Effects in Two-Dimensional Plots

Abstract:Humans struggle to perceive and interpret high-dimensional data. Therefore, high-dimensional data are often projected into two dimensions for visualization. Many applications benefit from complex nonlinear dimensionality reduction techniques, but the effects of individual high-dimensional features are hard to explain in the two-dimensional space. Most visualization solutions use multiple two-dimensional plots, each showing the effect of one high-dimensional feature in two dimensions; this approach creates a need for a visual inspection of k plots for a k-dimensional input space. Our solution, Feature Clock, provides a novel approach that eliminates the need to inspect these k plots to grasp the influence of original features on the data structure depicted in two dimensions. Feature Clock enhances the explainability and compactness of visualizations of embedded data and is available in an open-source Python library.

* To be published in IEEE VIS 2024

Via

Access Paper or Ask Questions

Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks

Jul 11, 2024

Andrey Ignatov, Josephine Yates, Valentina Boeva

Figure 1 for Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks

Figure 2 for Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks

Figure 3 for Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks

Figure 4 for Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks

Abstract:Histopathological images are widely used for the analysis of diseased (tumor) tissues and patient treatment selection. While the majority of microscopy image processing was previously done manually by pathologists, recent advances in computer vision allow for accurate recognition of lesion regions with deep learning-based solutions. Such models, however, usually require extensive annotated datasets for training, which is often not the case in the considered task, where the number of available patient data samples is very limited. To deal with this problem, we propose a novel DeepCMorph model pre-trained to learn cell morphology and identify a large number of different cancer types. The model consists of two modules: the first one performs cell nuclei segmentation and annotates each cell type, and is trained on a combination of 8 publicly available datasets to ensure its high generalizability and robustness. The second module combines the obtained segmentation map with the original microscopy image and is trained for the downstream task. We pre-trained this module on the Pan-Cancer TCGA dataset consisting of over 270K tissue patches extracted from 8736 diagnostic slides from 7175 patients. The proposed solution achieved a new state-of-the-art performance on the dataset under consideration, detecting 32 cancer types with over 82% accuracy and outperforming all previously proposed solutions by more than 4%. We demonstrate that the resulting pre-trained model can be easily fine-tuned on smaller microscopy datasets, yielding superior results compared to the current top solutions and models initialized with ImageNet weights. The codes and pre-trained models presented in this paper are available at: https://github.com/aiff22/DeepCMorph

Via

Access Paper or Ask Questions

scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data

Jun 27, 2024

Moritz Vandenhirtz, Florian Barkmann, Laura Manduchi, Julia E. Vogt, Valentina Boeva

Figure 1 for scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data

Figure 2 for scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data

Figure 3 for scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data

Figure 4 for scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data

Abstract:We propose a novel method, scTree, for single-cell Tree Variational Autoencoders, extending a hierarchical clustering approach to single-cell RNA sequencing data. scTree corrects for batch effects while simultaneously learning a tree-structured data representation. This VAE-based method allows for a more in-depth understanding of complex cellular landscapes independently of the biasing effects of batches. We show empirically on seven datasets that scTree discovers the underlying clusters of the data and the hierarchical relations between them, as well as outperforms established baseline methods across these datasets. Additionally, we analyze the learned hierarchy to understand its biological relevance, thus underpinning the importance of integrating batch correction directly into the clustering procedure.

Via

Access Paper or Ask Questions