Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ella Barkan

BioVERSE: Representation Alignment of Biomedical Modalities to LLMs for Multi-Modal Reasoning

Oct 01, 2025

Ching-Huei Tsou, Michal Ozery-Flato, Ella Barkan, Diwakar Mahajan, Ben Shapira

Abstract:Recent advances in large language models (LLMs) and biomedical foundation models (BioFMs) have achieved strong results in biological text reasoning, molecular modeling, and single-cell analysis, yet they remain siloed in disjoint embedding spaces, limiting cross-modal reasoning. We present BIOVERSE (Biomedical Vector Embedding Realignment for Semantic Engagement), a two-stage approach that adapts pretrained BioFMs as modality encoders and aligns them with LLMs through lightweight, modality-specific projection layers. The approach first aligns each modality to a shared LLM space through independently trained projections, allowing them to interoperate naturally, and then applies standard instruction tuning with multi-modal data to bring them together for downstream reasoning. By unifying raw biomedical data with knowledge embedded in LLMs, the approach enables zero-shot annotation, cross-modal question answering, and interactive, explainable dialogue. Across tasks spanning cell-type annotation, molecular description, and protein function reasoning, compact BIOVERSE configurations surpass larger LLM baselines while enabling richer, generative outputs than existing BioFMs, establishing a foundation for principled multi-modal biomedical reasoning.

Via

Access Paper or Ask Questions

BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models

Jun 17, 2025

Bharath Dandala, Michael M. Danziger, Ella Barkan, Tanwi Biswas, Viatcheslav Gurev, Jianying Hu, Matthew Madgwick, Akira Koseki, Tal Kozlovski, Michal Rosen-Zvi(+2 more)

Figure 1 for BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models

Figure 2 for BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models

Figure 3 for BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models

Figure 4 for BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models

Abstract:Transcriptomic foundation models (TFMs) have recently emerged as powerful tools for analyzing gene expression in cells and tissues, supporting key tasks such as cell-type annotation, batch correction, and perturbation prediction. However, the diversity of model implementations and training strategies across recent TFMs, though promising, makes it challenging to isolate the contribution of individual design choices or evaluate their potential synergies. This hinders the field's ability to converge on best practices and limits the reproducibility of insights across studies. We present BMFM-RNA, an open-source, modular software package that unifies diverse TFM pretraining and fine-tuning objectives within a single framework. Leveraging this capability, we introduce a novel training objective, whole cell expression decoder (WCED), which captures global expression patterns using an autoencoder-like CLS bottleneck representation. In this paper, we describe the framework, supported input representations, and training objectives. We evaluated four model checkpoints pretrained on CELLxGENE using combinations of masked language modeling (MLM), WCED and multitask learning. Using the benchmarking capabilities of BMFM-RNA, we show that WCED-based models achieve performance that matches or exceeds state-of-the-art approaches like scGPT across more than a dozen datasets in both zero-shot and fine-tuning tasks. BMFM-RNA, available as part of the biomed-multi-omics project ( https://github.com/BiomedSciAI/biomed-multi-omic ), offers a reproducible foundation for systematic benchmarking and community-driven exploration of optimal TFM training strategies, enabling the development of more effective tools to leverage the latest advances in AI for understanding cell biology.

Via

Access Paper or Ask Questions

MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language

Oct 28, 2024

Yoel Shoshan, Moshiko Raboh, Michal Ozery-Flato, Vadim Ratner, Alex Golts, Jeffrey K. Weber, Ella Barkan, Simona Rabinovici-Cohen, Sagi Polaczek, Ido Amos(+9 more)

Abstract:Drug discovery typically consists of multiple steps, including identifying a target protein key to a disease's etiology, validating that interacting with this target could prevent symptoms or cure the disease, discovering a small molecule or biologic therapeutic to interact with it, and optimizing the candidate molecule through a complex landscape of required properties. Drug discovery related tasks often involve prediction and generation while considering multiple entities that potentially interact, which poses a challenge for typical AI models. For this purpose we present MAMMAL - Molecular Aligned Multi-Modal Architecture and Language - a method that we applied to create a versatile multi-task foundation model ibm/biomed.omics.bl.sm.ma-ted-458m that learns from large-scale biological datasets (2 billion samples) across diverse modalities, including proteins, small molecules, and genes. We introduce a prompt syntax that supports a wide range of classification, regression, and generation tasks. It allows combining different modalities and entity types as inputs and/or outputs. Our model handles combinations of tokens and scalars and enables the generation of small molecules and proteins, property prediction, and transcriptomic lab test predictions. We evaluated the model on 11 diverse downstream tasks spanning different steps within a typical drug discovery pipeline, where it reaches new SOTA in 9 tasks and is comparable to SOTA in 2 tasks. This performance is achieved while using a unified architecture serving all tasks, in contrast to the original SOTA performance achieved using tailored architectures. The model code and pretrained weights are publicly available at https://github.com/BiomedSciAI/biomed-multi-alignment and https://huggingface.co/ibm/biomed.omics.bl.sm.ma-ted-458m.

Via

Access Paper or Ask Questions

Mammography Dual View Mass Correspondence

Jul 02, 2018

Shaked Perek, Alon Hazan, Ella Barkan, Ayelet Akselrod-Ballin

Figure 1 for Mammography Dual View Mass Correspondence

Figure 2 for Mammography Dual View Mass Correspondence

Figure 3 for Mammography Dual View Mass Correspondence

Figure 4 for Mammography Dual View Mass Correspondence

Abstract:Standard breast cancer screening involves the acquisition of two mammography X-ray projections for each breast. Typically, a comparison of both views supports the challenging task of tumor detection and localization. We introduce a deep learning, patch-based Siamese network for lesion matching in dual-view mammography. Our locally-fitted approach generates a joint patch pair representation and comparison with a shared configuration between the two views. We performed a comprehensive set of experiments with the network on standard datasets, among them the large Digital Database for Screening Mammography (DDSM). We analyzed the effect of transfer learning with the network between different types of datasets and compared the network-based matching to using Euclidean distance by template matching. Finally, we evaluated the contribution of the matching network in a full detection pipeline. Experimental results demonstrate the promise of improved detection accuracy using our approach.

Via

Access Paper or Ask Questions