Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aditya Malusare

Contrastive Cross-Modal Learning for Infusing Chest X-ray Knowledge into ECGs

Jun 24, 2025

Vineet Punyamoorty, Aditya Malusare, Vaneet Aggarwal

Abstract:Modern diagnostic workflows are increasingly multimodal, integrating diverse data sources such as medical images, structured records, and physiological time series. Among these, electrocardiograms (ECGs) and chest X-rays (CXRs) are two of the most widely used modalities for cardiac assessment. While CXRs provide rich diagnostic information, ECGs are more accessible and can support scalable early warning systems. In this work, we propose CroMoTEX, a novel contrastive learning-based framework that leverages chest X-rays during training to learn clinically informative ECG representations for multiple cardiac-related pathologies: cardiomegaly, pleural effusion, and edema. Our method aligns ECG and CXR representations using a novel supervised cross-modal contrastive objective with adaptive hard negative weighting, enabling robust and task-relevant feature learning. At test time, CroMoTEX relies solely on ECG input, allowing scalable deployment in real-world settings where CXRs may be unavailable. Evaluated on the large-scale MIMIC-IV-ECG and MIMIC-CXR datasets, CroMoTEX outperforms baselines across all three pathologies, achieving up to 78.31 AUROC on edema. Our code is available at github.com/vineetpmoorty/cromotex.

Via

Access Paper or Ask Questions

BalancedDPO: Adaptive Multi-Metric Alignment

Mar 16, 2025

Dipesh Tamboli, Souradip Chakraborty, Aditya Malusare, Biplab Banerjee, Amrit Singh Bedi, Vaneet Aggarwal

Figure 1 for BalancedDPO: Adaptive Multi-Metric Alignment

Figure 2 for BalancedDPO: Adaptive Multi-Metric Alignment

Figure 3 for BalancedDPO: Adaptive Multi-Metric Alignment

Figure 4 for BalancedDPO: Adaptive Multi-Metric Alignment

Abstract:Text-to-image (T2I) diffusion models have made remarkable advancements, yet aligning them with diverse preferences remains a persistent challenge. Current methods often optimize single metrics or depend on narrowly curated datasets, leading to overfitting and limited generalization across key visual quality metrics. We present BalancedDPO, a novel extension of Direct Preference Optimization (DPO) that addresses these limitations by simultaneously aligning T2I diffusion models with multiple metrics, including human preference, CLIP score, and aesthetic quality. Our key novelty lies in aggregating consensus labels from diverse metrics in the preference distribution space as compared to existing reward mixing approaches, enabling robust and scalable multi-metric alignment while maintaining the simplicity of the standard DPO pipeline that we refer to as BalancedDPO. Our evaluations on the Pick-a-Pic, PartiPrompt and HPD datasets show that BalancedDPO achieves state-of-the-art results, outperforming existing approaches across all major metrics. BalancedDPO improves the average win rates by 15%, 7.1%, and 10.3% on Pick-a-pic, PartiPrompt and HPD, respectively, from the DiffusionDPO.

Via

Access Paper or Ask Questions

Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model

Feb 13, 2024

Aditya Malusare, Vaneet Aggarwal

Figure 1 for Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model

Figure 2 for Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model

Figure 3 for Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model

Figure 4 for Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model

Abstract:Recent advancements in generative models have established state-of-the-art benchmarks in generating molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called K-DReAM. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. K-DReAM outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.

* 12 pages

Via

Access Paper or Ask Questions

Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision

Nov 04, 2023

Aditya Malusare, Harish Kothandaraman, Dipesh Tamboli, Nadia A. Lanman, Vaneet Aggarwal

Figure 1 for Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision

Figure 2 for Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision

Figure 3 for Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision

Figure 4 for Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision

Abstract:This paper presents the Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) foundation model, analyzing DNA sequences at byte-level precision with an encoder-decoder Transformer architecture. ENBED uses a sub-quadratic implementation of attention to develop an efficient model capable of sequence-to-sequence transformations, generalizing previous genomic models with encoder-only or decoder-only architectures. We use Masked Language Modeling to pre-train the foundation model using reference genome sequences and apply it in the following downstream tasks: (1) identification of enhancers, promotors and splice sites, (2) identification of biological function annotations of genomic sequences, (3) recognition of sequences containing base call mismatches and insertion/deletion errors, an advantage over tokenization schemes involving multiple base pairs, which lose the ability to analyze with byte-level precision, and (4) generating mutations of the Influenza virus using the encoder-decoder architecture and validating them against real-world observations. In each of these tasks, we demonstrate significant improvement as compared to the existing state-of-the-art results.

* 12 pages

Via

Access Paper or Ask Questions