Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Navyansh Mahla

Federated Cross-Modal Style-Aware Prompt Generation

Aug 17, 2025

Suraj Prasad, Navyansh Mahla, Sunny Gupta, Amit Sethi

Abstract:Prompt learning has propelled vision-language models like CLIP to excel in diverse tasks, making them ideal for federated learning due to computational efficiency. However, conventional approaches that rely solely on final-layer features miss out on rich multi-scale visual cues and domain-specific style variations in decentralized client data. To bridge this gap, we introduce FedCSAP (Federated Cross-Modal Style-Aware Prompt Generation). Our framework harnesses low, mid, and high-level features from CLIP's vision encoder alongside client-specific style indicators derived from batch-level statistics. By merging intricate visual details with textual context, FedCSAP produces robust, context-aware prompt tokens that are both distinct and non-redundant, thereby boosting generalization across seen and unseen classes. Operating within a federated learning paradigm, our approach ensures data privacy through local training and global aggregation, adeptly handling non-IID class distributions and diverse domain-specific styles. Comprehensive experiments on multiple image classification datasets confirm that FedCSAP outperforms existing federated prompt learning methods in both accuracy and overall generalization.

Via

Access Paper or Ask Questions

Sequential Compression Layers for Efficient Federated Learning in Foundational Models

Dec 09, 2024

Navyansh Mahla, Sunny Gupta, Amit Sethi

Abstract:Federated Learning (FL) has gained popularity for fine-tuning large language models (LLMs) across multiple nodes, each with its own private data. While LoRA has been widely adopted for parameter efficient federated fine-tuning, recent theoretical and empirical studies highlight its suboptimal performance in the federated learning context. In response, we propose a novel, simple, and more effective parameter-efficient fine-tuning method that does not rely on LoRA. Our approach introduces a small multi-layer perceptron (MLP) layer between two existing MLP layers the up proj (the FFN projection layer following the self-attention module) and down proj within the feed forward network of the transformer block. This solution addresses the bottlenecks associated with LoRA in federated fine tuning and outperforms recent LoRA-based approaches, demonstrating superior performance for both language models and vision encoders.

Via

Access Paper or Ask Questions

Why Gradient Subspace? Identifying and Mitigating LoRA's Bottlenecks in Federated Fine-Tuning of Large Language Models

Oct 31, 2024

Navyansh Mahla, Ganesh Ramakrishnan

Figure 1 for Why Gradient Subspace? Identifying and Mitigating LoRA's Bottlenecks in Federated Fine-Tuning of Large Language Models

Figure 2 for Why Gradient Subspace? Identifying and Mitigating LoRA's Bottlenecks in Federated Fine-Tuning of Large Language Models

Figure 3 for Why Gradient Subspace? Identifying and Mitigating LoRA's Bottlenecks in Federated Fine-Tuning of Large Language Models

Figure 4 for Why Gradient Subspace? Identifying and Mitigating LoRA's Bottlenecks in Federated Fine-Tuning of Large Language Models

Abstract:Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, particularly in task generalization for both text and vision data. While fine-tuning these models can significantly enhance their performance on specific downstream tasks, it often requires high-quality data that cannot be shared due to privacy concerns. Federated Learning (FL) offers a promising solution for collaborative training without direct data sharing. However, many parameter-efficient fine-tuning strategies for LLMs in FL, particularly those based on Low-Rank Adaptation (LoRA), face limitations. In this paper, we critically analyze the convergence and performance guarantees of popular FL frameworks utilizing LoRA, highlighting its suboptimal nature due to constrained subspace learning of low-rank matrices. This limitation hinders effective fine-tuning of LLMs in federated settings. Through rigorous analytical and empirical evaluations, we demonstrate that direct weight averaging outperforms LoRA-based strategies, leading to superior performance for fine-tuned models. Our comprehensive comparison exposes inefficiencies in LoRA approaches and underscores the advantages of direct weight aggregation. We extend our analysis to low-rank gradient-based optimizers, such as GaLore, used during local training steps. Our findings show that GaLore is a more effective alternative, outperforming federated LoRA methods like FlexLoRA and FFA-LoRA across both text and image modalities. While privacy remains paramount in FL discourse, our focus is on assessing performance outcomes of federated fine-tuned models and evaluating various FL frameworks from both theoretical and empirical perspectives. Our findings advocate reassessing the reliance on LoRA within FL contexts, paving the way for more efficient training methodologies.

* 24 pages, 10 figures, pre-print

Via

Access Paper or Ask Questions

MedSAGa: Few-shot Memory Efficient Medical Image Segmentation using Gradient Low-Rank Projection in SAM

Jul 21, 2024

Navyansh Mahla, Annie D'souza, Shubh Gupta, Bhavik Kanekar, Kshitij Sharad Jadhav

Figure 1 for MedSAGa: Few-shot Memory Efficient Medical Image Segmentation using Gradient Low-Rank Projection in SAM

Figure 2 for MedSAGa: Few-shot Memory Efficient Medical Image Segmentation using Gradient Low-Rank Projection in SAM

Figure 3 for MedSAGa: Few-shot Memory Efficient Medical Image Segmentation using Gradient Low-Rank Projection in SAM

Figure 4 for MedSAGa: Few-shot Memory Efficient Medical Image Segmentation using Gradient Low-Rank Projection in SAM

Abstract:The application of large-scale models in medical image segmentation demands substantial quantities of meticulously annotated data curated by experts along with high computational resources, both of which are challenges in resource-poor settings. In this study, we present the Medical Segment Anything Model with Galore MedSAGa where we adopt the Segment Anything Model (SAM) to achieve memory-efficient, few-shot medical image segmentation by applying Gradient Low-Rank Projection GaLore to the parameters of the image encoder of SAM. Meanwhile, the weights of the prompt encoder and mask decoder undergo full parameter fine-tuning using standard optimizers. We further assess MedSAGa's few-shot learning capabilities, reporting on its memory efficiency and segmentation performance across multiple standard medical image segmentation datasets. We compare it with several baseline models, including LoRA fine-tuned SAM (SAMed) and DAE-Former. Experiments across multiple datasets and these baseline models with different number of images for fine tuning demonstrated that the GPU memory consumption of MedSAGa is significantly less than that of the baseline models, achieving an average memory efficiency of 66% more than current state-of-the-art (SOTA) models for medical image segmentation. The combination of substantially lower memory requirements and comparable to SOTA results in few-shot learning for medical image segmentation positions MedSAGa as an optimal solution for deployment in resource-constrained settings.

Via

Access Paper or Ask Questions

FESS Loss: Feature-Enhanced Spatial Segmentation Loss for Optimizing Medical Image Analysis

Feb 26, 2024

Charulkumar Chodvadiya, Navyansh Mahla, Kinshuk Gaurav Singh, Kshitij Sharad Jadhav

Abstract:Medical image segmentation is a critical process in the field of medical imaging, playing a pivotal role in diagnosis, treatment, and research. It involves partitioning of an image into multiple regions, representing distinct anatomical or pathological structures. Conventional methods often grapple with the challenge of balancing spatial precision and comprehensive feature representation due to their reliance on traditional loss functions. To overcome this, we propose Feature-Enhanced Spatial Segmentation Loss (FESS Loss), that integrates the benefits of contrastive learning (which extracts intricate features, particularly in the nuanced domain of medical imaging) with the spatial accuracy inherent in the Dice loss. The objective is to augment both spatial precision and feature-based representation in the segmentation of medical images. FESS Loss signifies a notable advancement, offering a more accurate and refined segmentation process, ultimately contributing to heightened precision in the analysis of medical images. Further, FESS loss demonstrates superior performance in limited annotated data availability scenarios often present in the medical domain.

* 5 Pages, 3 figures

Via

Access Paper or Ask Questions