Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shantanu Singh

Broad Institute of MIT and Harvard, United States

cp_measure: API-first feature extraction for image-based profiling workflows

Jul 01, 2025

Alán F. Muñoz, Tim Treis, Alexandr A. Kalinin, Shatavisha Dasgupta, Fabian Theis, Anne E. Carpenter, Shantanu Singh

Abstract:Biological image analysis has traditionally focused on measuring specific visual properties of interest for cells or other entities. A complementary paradigm gaining increasing traction is image-based profiling - quantifying many distinct visual features to form comprehensive profiles which may reveal hidden patterns in cellular states, drug responses, and disease mechanisms. While current tools like CellProfiler can generate these feature sets, they pose significant barriers to automated and reproducible analyses, hindering machine learning workflows. Here we introduce cp_measure, a Python library that extracts CellProfiler's core measurement capabilities into a modular, API-first tool designed for programmatic feature extraction. We demonstrate that cp_measure features retain high fidelity with CellProfiler features while enabling seamless integration with the scientific Python ecosystem. Through applications to 3D astrocyte imaging and spatial transcriptomics, we showcase how cp_measure enables reproducible, automated image-based profiling pipelines that scale effectively for machine learning applications in computational biology.

* 10 pages, 4 figures, 4 supplementary figures. CODEML Workshop paper accepted (non-archival), as a part of ICML2025 events

Via

Access Paper or Ask Questions

Learning Molecular Representation in a Cell

Jun 17, 2024

Gang Liu, Srijit Seal, John Arevalo, Zhenwen Liang, Anne E. Carpenter, Meng Jiang, Shantanu Singh

Abstract:Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream tasks: molecular property prediction against up to 19 baseline methods across four datasets, plus zero-shot molecule-morphology matching.

* 21 pages, 8 tables, 7 figures

Via

Access Paper or Ask Questions

MOTI$\mathcal{VE}$: A Drug-Target Interaction Graph For Inductive Link Prediction

Jun 12, 2024

John Arevalo, Ellen Su, Anne E Carpenter, Shantanu Singh

$Figure 1 for MOTI$\mathcal{VE}$: A Drug-Target Interaction Graph For Inductive Link Prediction$

$Figure 2 for MOTI$\mathcal{VE}$: A Drug-Target Interaction Graph For Inductive Link Prediction$

$Figure 3 for MOTI$\mathcal{VE}$: A Drug-Target Interaction Graph For Inductive Link Prediction$

$Figure 4 for MOTI$\mathcal{VE}$: A Drug-Target Interaction Graph For Inductive Link Prediction$

Abstract:Drug-target interaction (DTI) prediction is crucial for identifying new therapeutics and detecting mechanisms of action. While structure-based methods accurately model physical interactions between a drug and its protein target, cell-based assays such as Cell Painting can better capture complex DTI interactions. This paper introduces MOTI$\mathcal{VE}$, a Morphological cOmpound Target Interaction Graph dataset that comprises Cell Painting features for $11,000$ genes and $3,600$ compounds along with their relationships extracted from seven publicly available databases. We provide random, cold-source (new drugs), and cold-target (new genes) data splits to enable rigorous evaluation under realistic use cases. Our benchmark results show that graph neural networks that use Cell Painting features consistently outperform those that learn from graph structure alone, feature-based models, and topological heuristics. MOTI$\mathcal{VE}$ accelerates both graph ML research and drug discovery by promoting the development of more reliable DTI prediction models. MOTI$\mathcal{VE}$ resources are available at https://github.com/carpenter-singh-lab/motive.

Via

Access Paper or Ask Questions

Understanding Biology in the Age of Artificial Intelligence

Mar 06, 2024

Elsa Lawrence, Adham El-Shazly, Srijit Seal, Chaitanya K Joshi, Pietro Liò, Shantanu Singh, Andreas Bender, Pietro Sormanni, Matthew Greenig

Abstract:Modern life sciences research is increasingly relying on artificial intelligence approaches to model biological systems, primarily centered around the use of machine learning (ML) models. Although ML is undeniably useful for identifying patterns in large, complex data sets, its widespread application in biological sciences represents a significant deviation from traditional methods of scientific inquiry. As such, the interplay between these models and scientific understanding in biology is a topic with important implications for the future of scientific research, yet it is a subject that has received little attention. Here, we draw from an epistemological toolkit to contextualize recent applications of ML in biological sciences under modern philosophical theories of understanding, identifying general principles that can guide the design and application of ML systems to model biological phenomena and advance scientific knowledge. We propose that conceptions of scientific understanding as information compression, qualitative intelligibility, and dependency relation modelling provide a useful framework for interpreting ML-mediated understanding of biological systems. Through a detailed analysis of two key application areas of ML in modern biological research - protein structure prediction and single cell RNA-sequencing - we explore how these features have thus far enabled ML systems to advance scientific understanding of their target phenomena, how they may guide the development of future ML models, and the key obstacles that remain in preventing ML from achieving its potential as a tool for biological discovery. Consideration of the epistemological features of ML applications in biology will improve the prospects of these methods to solve important problems and advance scientific understanding of living systems.

Via

Access Paper or Ask Questions

Pseudo-Labeling Enhanced by Privileged Information and Its Application to In Situ Sequencing Images

Jun 28, 2023

Marzieh Haghighi, Mario C. Cruz, Erin Weisbart, Beth A. Cimini, Avtar Singh, Julia Bauman, Maria E. Lozada, Sanam L. Kavari, James T. Neal, Paul C. Blainey(+2 more)

Abstract:Various strategies for label-scarce object detection have been explored by the computer vision research community. These strategies mainly rely on assumptions that are specific to natural images and not directly applicable to the biological and biomedical vision domains. For example, most semi-supervised learning strategies rely on a small set of labeled data as a confident source of ground truth. In many biological vision applications, however, the ground truth is unknown and indirect information might be available in the form of noisy estimations or orthogonal evidence. In this work, we frame a crucial problem in spatial transcriptomics - decoding barcodes from In-Situ-Sequencing (ISS) images - as a semi-supervised object detection (SSOD) problem. Our proposed framework incorporates additional available sources of information into a semi-supervised learning framework in the form of privileged information. The privileged information is incorporated into the teacher's pseudo-labeling in a teacher-student self-training iteration. Although the available privileged information could be data domain specific, we have introduced a general strategy of pseudo-labeling enhanced by privileged information (PLePI) and exemplified the concept using ISS images, as well on the COCO benchmark using extra evidence provided by CLIP.

* IJCAI 2023
* This paper has been accepted for publication at IJCAI 2023

Via

Access Paper or Ask Questions

Smart Laptop Bag with Machine Learning for Activity Recognition

Apr 14, 2019

Dwij Sukeshkumar Sheth, Shantanu Singh, Prakhar S Mathur, Vydeki D

Figure 1 for Smart Laptop Bag with Machine Learning for Activity Recognition

Figure 2 for Smart Laptop Bag with Machine Learning for Activity Recognition

Figure 3 for Smart Laptop Bag with Machine Learning for Activity Recognition

Figure 4 for Smart Laptop Bag with Machine Learning for Activity Recognition

Abstract:In todays world of smart living, the smart laptop bag, presented in this paper, provides a better solution to keep track of our precious possessions and monitoring them in real time. As the world moves towards a much tech-savvy direction, the novel laptop bag discussed here facilitates the user to perform location tracking, ambiance monitoring, user-state monitoring etc. in one device. The innovative design uses cloud computing and machine learning algorithms to monitor the health of the user and many parameters of the bag. The emergency alert system in this bag could be trained to send appropriate notifications to emergency contacts of the user, in case of abnormal health conditions or theft of the bag. The experimental smart laptop bag uses deep neural network, which was trained and tested over the various parameters from the bag and produces above 95% accurate results.

Via

Access Paper or Ask Questions