Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oren Kraus

RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy

Mar 26, 2025

Oren Kraus, Federico Comitani, John Urbanik, Kian Kenyon-Dean, Lakshmanan Arumugam, Saber Saberian, Cas Wognum, Safiye Celik, Imran S. Haque

Figure 1 for RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy

Figure 2 for RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy

Figure 3 for RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy

Figure 4 for RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy

Abstract:High Content Screening (HCS) microscopy datasets have transformed the ability to profile cellular responses to genetic and chemical perturbations, enabling cell-based inference of drug-target interactions (DTI). However, the adoption of representation learning methods for HCS data has been hindered by the lack of accessible datasets and robust benchmarks. To address this gap, we present RxRx3-core, a curated and compressed subset of the RxRx3 dataset, and an associated DTI benchmarking task. At just 18GB, RxRx3-core significantly reduces the size barrier associated with large-scale HCS datasets while preserving critical data necessary for benchmarking representation learning models against a zero-shot DTI prediction task. RxRx3-core includes 222,601 microscopy images spanning 736 CRISPR knockouts and 1,674 compounds at 8 concentrations. RxRx3-core is available on HuggingFace and Polaris, along with pre-trained embeddings and benchmarking code, ensuring accessibility for the research community. By providing a compact dataset and robust benchmarks, we aim to accelerate innovation in representation learning methods for HCS data and support the discovery of novel biological insights.

* Published at LMRL Workshop at ICLR 2025

Via

Access Paper or Ask Questions

ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

Nov 04, 2024

Kian Kenyon-Dean, Zitong Jerry Wang, John Urbanik, Konstantin Donhauser, Jason Hartford, Saber Saberian, Nil Sahin, Ihab Bendidi, Safiye Celik, Marta Fay(+3 more)

Figure 1 for ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

Figure 2 for ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

Figure 3 for ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

Figure 4 for ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

Abstract:Large-scale cell microscopy screens are used in drug discovery and molecular biology research to study the effects of millions of chemical and genetic perturbations on cells. To use these images in downstream analysis, we need models that can map each image into a feature space that represents diverse biological phenotypes consistently, in the sense that perturbations with similar biological effects have similar representations. In this work, we present the largest foundation model for cell microscopy data to date, a new 1.9 billion-parameter ViT-G/8 MAE trained on over 8 billion microscopy image crops. Compared to a previous published ViT-L/8 MAE, our new model achieves a 60% improvement in linear separability of genetic perturbations and obtains the best overall performance on whole-genome biological relationship recall and replicate consistency benchmarks. Beyond scaling, we developed two key methods that improve performance: (1) training on a curated and diverse dataset; and, (2) using biologically motivated linear probing tasks to search across each transformer block for the best candidate representation of whole-genome screens. We find that many self-supervised vision transformers, pretrained on either natural or microscopy images, yield significantly more biologically meaningful representations of microscopy images in their intermediate blocks than in their typically used final blocks. More broadly, our approach and results provide insights toward a general strategy for successfully building foundation models for large-scale biological data.

* NeurIPS 2024 Foundation Models for Science Workshop (38th Conference on Neural Information Processing Systems). 18 pages, 7 figures

Via

Access Paper or Ask Questions

Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

Apr 16, 2024

Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik(+7 more)

Figure 1 for Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

Figure 2 for Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

Figure 3 for Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

Figure 4 for Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

Abstract:Featurizing microscopy images for use in biological research remains a significant challenge, especially for large-scale experiments spanning millions of images. This work explores the scaling properties of weakly supervised classifiers and self-supervised masked autoencoders (MAEs) when training with increasingly larger model backbones and microscopy datasets. Our results show that ViT-based MAEs outperform weakly supervised classifiers on a variety of tasks, achieving as much as a 11.5% relative improvement when recalling known biological relationships curated from public databases. Additionally, we develop a new channel-agnostic MAE architecture (CA-MAE) that allows for inputting images of different numbers and orders of channels at inference time. We demonstrate that CA-MAEs effectively generalize by inferring and evaluating on a microscopy image dataset (JUMP-CP) generated under different experimental conditions with a different channel structure than our pretraining data (RPI-93M). Our findings motivate continued research into scaling self-supervised learning on microscopy data in order to create powerful foundation models of cellular biology that have the potential to catalyze advancements in drug discovery and beyond.

* CVPR 2024 Highlight. arXiv admin note: text overlap with arXiv:2309.16064

Via

Access Paper or Ask Questions

Masked autoencoders are scalable learners of cellular morphology

Sep 27, 2023

Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik(+6 more)

Figure 1 for Masked autoencoders are scalable learners of cellular morphology

Figure 2 for Masked autoencoders are scalable learners of cellular morphology

Figure 3 for Masked autoencoders are scalable learners of cellular morphology

Figure 4 for Masked autoencoders are scalable learners of cellular morphology

Abstract:Inferring biological relationships from cellular phenotypes in high-content microscopy screens provides significant opportunity and challenge in biological research. Prior results have shown that deep vision models can capture biological signal better than hand-crafted features. This work explores how weakly supervised and self-supervised deep learning approaches scale when training larger models on larger datasets. Our results show that both CNN- and ViT-based masked autoencoders significantly outperform weakly supervised models. At the high-end of our scale, a ViT-L/8 trained on over 3.5-billion unique crops sampled from 95-million microscopy images achieves relative improvements as high as 28% over our best weakly supervised models at inferring known biological relationships curated from public databases.

* 4 pages, 4 figures

Via

Access Paper or Ask Questions

RxRx1: A Dataset for Evaluating Experimental Batch Correction Methods

Jan 13, 2023

Maciej Sypetkowski, Morteza Rezanejad, Saber Saberian, Oren Kraus, John Urbanik, James Taylor, Ben Mabey, Mason Victors, Jason Yosinski, Alborz Rezazadeh Sereshkeh(+2 more)

Figure 1 for RxRx1: A Dataset for Evaluating Experimental Batch Correction Methods

Figure 2 for RxRx1: A Dataset for Evaluating Experimental Batch Correction Methods

Figure 3 for RxRx1: A Dataset for Evaluating Experimental Batch Correction Methods

Figure 4 for RxRx1: A Dataset for Evaluating Experimental Batch Correction Methods

Abstract:High-throughput screening techniques are commonly used to obtain large quantities of data in many fields of biology. It is well known that artifacts arising from variability in the technical execution of different experimental batches within such screens confound these observations and can lead to invalid biological conclusions. It is therefore necessary to account for these batch effects when analyzing outcomes. In this paper we describe RxRx1, a biological dataset designed specifically for the systematic study of batch effect correction methods. The dataset consists of 125,510 high-resolution fluorescence microscopy images of human cells under 1,138 genetic perturbations in 51 experimental batches across 4 cell types. Visual inspection of the images alone clearly demonstrates significant batch effects. We propose a classification task designed to evaluate the effectiveness of experimental batch correction methods on these images and examine the performance of a number of correction methods on this task. Our goal in releasing RxRx1 is to encourage the development of effective experimental batch correction methods that generalize well to unseen experimental batches. The dataset can be downloaded at https://rxrx.ai.

Via

Access Paper or Ask Questions