Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiayu Zhou

Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging

May 28, 2025

Haobo Zhang, Jiayu Zhou

Abstract:Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage. Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training. However, existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation. In this paper, we show that this issue arises from a previously overlooked interplay between model parameters and data distributions. We propose Orthogonal Subspaces for Robust model Merging (OSRM) to constrain the LoRA subspace *prior* to fine-tuning, ensuring that updates relevant to one task do not adversely shift outputs for others. Our approach can seamlessly integrate with most existing merging algorithms, reducing the unintended interference among tasks. Extensive experiments on eight datasets, tested with three widely used LMs and two large LMs, demonstrate that our method not only boosts merging performance but also preserves single-task accuracy. Furthermore, our approach exhibits greater robustness to the hyperparameters of merging. These results highlight the importance of data-parameter interaction in model merging and offer a plug-and-play solution for merging LoRA models.

* 14 pages, 5 figures, 16 tables, accepted by ACL 2025

Via

Access Paper or Ask Questions

Dynamic Uncertainty Ranking: Enhancing In-Context Learning for Long-Tail Knowledge in LLMs

Oct 31, 2024

Shuyang Yu, Runxue Bao, Parminder Bhatia, Taha Kass-Hout, Jiayu Zhou, Cao Xiao

Abstract:Large language models (LLMs) can learn vast amounts of knowledge from diverse domains during pre-training. However, long-tail knowledge from specialized domains is often scarce and underrepresented, rarely appearing in the models' memorization. Prior work has shown that in-context learning (ICL) with retriever augmentation can help LLMs better capture long-tail knowledge, reducing their reliance on pre-trained data. Despite these advances, we observe that LLM predictions for long-tail questions remain uncertain to variations in retrieved samples. To take advantage of the uncertainty in ICL for guiding LLM predictions toward correct answers on long-tail samples, we propose a reinforcement learning-based dynamic uncertainty ranking method for ICL that accounts for the varying impact of each retrieved sample on LLM predictions. Our approach prioritizes more informative and stable samples while demoting misleading ones, updating rankings based on the feedback from the LLM w.r.t. each retrieved sample. To enhance training efficiency and reduce query costs, we introduce a learnable dynamic ranking threshold, adjusted when the model encounters negative prediction shifts. Experimental results on various question-answering datasets from different domains show that our method outperforms the best baseline by $2.76\%$, with a notable $5.96\%$ boost in accuracy on long-tail questions that elude zero-shot inference.

Via

Access Paper or Ask Questions

Distributed In-Context Learning under Non-IID Among Clients

Jul 31, 2024

Siqi Liang, Sumyeong Ahn, Jiayu Zhou

Figure 1 for Distributed In-Context Learning under Non-IID Among Clients

Figure 2 for Distributed In-Context Learning under Non-IID Among Clients

Figure 3 for Distributed In-Context Learning under Non-IID Among Clients

Figure 4 for Distributed In-Context Learning under Non-IID Among Clients

Abstract:Advancements in large language models (LLMs) have shown their effectiveness in multiple complicated natural language reasoning tasks. A key challenge remains in adapting these models efficiently to new or unfamiliar tasks. In-context learning (ICL) provides a promising solution for few-shot adaptation by retrieving a set of data points relevant to a query, called in-context examples (ICE), from a training dataset and providing them during the inference as context. Most existing studies utilize a centralized training dataset, yet many real-world datasets may be distributed among multiple clients, and remote data retrieval can be associated with costs. Especially when the client data are non-identical independent distributions (non-IID), retrieving from clients a proper set of ICEs needed for a test query presents critical challenges. In this paper, we first show that in this challenging setting, test queries will have different preferences among clients because of non-IIDness, and equal contribution often leads to suboptimal performance. We then introduce a novel approach to tackle the distributed non-IID ICL problem when a data usage budget is present. The principle is that each client's proper contribution (budget) should be designed according to the preference of each query for that client. Our approach uses a data-driven manner to allocate a budget for each client, tailored to each test query. Through extensive empirical studies on diverse datasets, our framework demonstrates superior performance relative to competing baselines.

* 12 pages

Via

Access Paper or Ask Questions

Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

May 26, 2024

Jiankun Wang, Sumyeong Ahn, Taykhoom Dalal, Xiaodan Zhang, Weishen Pan, Qiannan Zhang, Bin Chen, Hiroko H. Dodge, Fei Wang, Jiayu Zhou

Figure 1 for Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

Figure 2 for Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

Figure 3 for Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

Figure 4 for Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

Abstract:Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for developing ADRD screening tools such as machine learning based predictive models. Recent advancements in large language models (LLMs) demonstrate their unprecedented capability of encoding knowledge and performing reasoning, which offers them strong potential for enhancing risk prediction. This paper proposes a novel pipeline that augments risk prediction by leveraging the few-shot inference power of LLMs to make predictions on cases where traditional supervised learning methods (SLs) may not excel. Specifically, we develop a collaborative pipeline that combines SLs and LLMs via a confidence-driven decision-making mechanism, leveraging the strengths of SLs in clear-cut cases and LLMs in more complex scenarios. We evaluate this pipeline using a real-world EHR data warehouse from Oregon Health \& Science University (OHSU) Hospital, encompassing EHRs from over 2.5 million patients and more than 20 million patient encounters. Our results show that our proposed approach effectively combines the power of SLs and LLMs, offering significant improvements in predictive performance. This advancement holds promise for revolutionizing ADRD screening and early detection practices, with potential implications for better strategies of patient management and thus improving healthcare.

Via

Access Paper or Ask Questions

Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization

May 23, 2024

Bao Hoang, Yijiang Pang, Siqi Liang, Liang Zhan, Paul Thompson, Jiayu Zhou

Figure 1 for Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization

Figure 2 for Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization

Figure 3 for Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization

Figure 4 for Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization

Abstract:Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or facilities, thereby violating the i.i.d. rule. A common strategy is to harmonize the site bias while retaining important biological information. The ComBat is among the most popular harmonization approaches and has recently been extended to handle distributed sites. However, when faced with situations involving newly joined sites in training or evaluating data from unknown/unseen sites, ComBat lacks compatibility and requires retraining with data from all the sites. The retraining leads to significant computational and logistic overhead that is usually prohibitive. In this work, we develop a novel Cluster ComBat harmonization algorithm, which leverages cluster patterns of the data in different sites and greatly advances the usability of ComBat harmonization. We use extensive simulation and real medical imaging data from ADNI to demonstrate the superiority of the proposed approach.

* 11 pages, 7 figures, accepted to KDD2024-ADS

Via

Access Paper or Ask Questions

Towards Stability of Parameter-free Optimization

May 07, 2024

Yijiang Pang, Shuyang Yu, Bao Hoang, Jiayu Zhou

Figure 1 for Towards Stability of Parameter-free Optimization

Figure 2 for Towards Stability of Parameter-free Optimization

Figure 3 for Towards Stability of Parameter-free Optimization

Figure 4 for Towards Stability of Parameter-free Optimization

Abstract:Hyperparameter tuning, particularly the selection of an appropriate learning rate in adaptive gradient training methods, remains a challenge. To tackle this challenge, in this paper, we propose a novel parameter-free optimizer, AdamG (Adam with the golden step size), designed to automatically adapt to diverse optimization problems without manual tuning. The core technique underlying AdamG is our golden step size derived for the AdaGrad-Norm algorithm, which is expected to help AdaGrad-Norm preserve the tuning-free convergence and approximate the optimal step size in expectation w.r.t. various optimization scenarios. To better evaluate tuning-free performance, we propose a novel evaluation criterion, stability, to comprehensively assess the efficacy of parameter-free optimizers in addition to classical performance criteria. Empirical results demonstrate that compared with other parameter-free baselines, AdamG achieves superior performance, which is consistently on par with Adam using a manually tuned learning rate across various optimization tasks.

Via

Access Paper or Ask Questions

FedGreen: Carbon-aware Federated Learning with Model Size Adaptation

Apr 23, 2024

Ali Abbasi, Fan Dong, Xin Wang, Henry Leung, Jiayu Zhou, Steve Drew

Figure 1 for FedGreen: Carbon-aware Federated Learning with Model Size Adaptation

Figure 2 for FedGreen: Carbon-aware Federated Learning with Model Size Adaptation

Figure 3 for FedGreen: Carbon-aware Federated Learning with Model Size Adaptation

Figure 4 for FedGreen: Carbon-aware Federated Learning with Model Size Adaptation

Abstract:Federated learning (FL) provides a promising collaborative framework to build a model from distributed clients, and this work investigates the carbon emission of the FL process. Cloud and edge servers hosting FL clients may exhibit diverse carbon footprints influenced by their geographical locations with varying power sources, offering opportunities to reduce carbon emissions by training local models with adaptive computations and communications. In this paper, we propose FedGreen, a carbon-aware FL approach to efficiently train models by adopting adaptive model sizes shared with clients based on their carbon profiles and locations using ordered dropout as a model compression technique. We theoretically analyze the trade-offs between the produced carbon emissions and the convergence accuracy, considering the carbon intensity discrepancy across countries to choose the parameters optimally. Empirical studies show that FedGreen can substantially reduce the carbon footprints of FL compared to the state-of-the-art while maintaining competitive model accuracy.

Via

Access Paper or Ask Questions

On the Generalization Ability of Unsupervised Pretraining

Mar 11, 2024

Yuyang Deng, Junyuan Hong, Jiayu Zhou, Mehrdad Mahdavi

Figure 1 for On the Generalization Ability of Unsupervised Pretraining

Figure 2 for On the Generalization Ability of Unsupervised Pretraining

Figure 3 for On the Generalization Ability of Unsupervised Pretraining

Abstract:Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. However, a rigorous understanding of how the representation function learned on an unlabeled dataset affects the generalization of the fine-tuned model is lacking. Existing theoretical research does not adequately account for the heterogeneity of the distribution and tasks in pre-training and fine-tuning stage. To bridge this gap, this paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase, ultimately affecting the generalization capabilities of the fine-tuned model on downstream tasks. We apply our theoretical framework to analyze generalization bound of two distinct scenarios: Context Encoder pre-training with deep neural networks and Masked Autoencoder pre-training with deep transformers, followed by fine-tuning on a binary classification task. Finally, inspired by our findings, we propose a novel regularization method during pre-training to further enhances the generalization of fine-tuned model. Overall, our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.

Via

Access Paper or Ask Questions

Stochastic Two Points Method for Deep Model Zeroth-order Optimization

Feb 02, 2024

Yijiang Pang, Jiayu Zhou

Abstract:Large foundation models, such as large language models, have performed exceptionally well in various application scenarios. Building or fully fine-tuning such large models is usually prohibitive due to either hardware budget or lack of access to backpropagation. The zeroth-order methods offer a promising direction for tackling this challenge, where only forward passes are needed to update the model. This paper introduces an efficient Stochastic Two-Point (S2P) approach within the gradient-free regime. We present the theoretical convergence properties of S2P under the general and relaxed smoothness assumptions. The theoretical properties also shed light on a faster and more stable S2P variant, Accelerated S2P (AS2P), through exploiting our new convergence properties that better represent the dynamics of deep models in training. Our comprehensive empirical results show that AS2P is highly effective in optimizing objectives for large deep models, including language models, and outperforms standard methods across various model types and scales, with 2 $\times$ speed-up in training over most conducted tasks.

Via

Access Paper or Ask Questions

Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning

Dec 19, 2023

Xiaodan Zhang, Sandeep Vemulapalli, Nabasmita Talukdar, Sumyeong Ahn, Jiankun Wang, Han Meng, Sardar Mehtab Bin Murtaza, Aakash Ajay Dave, Dmitry Leshchiner, Dimitri F. Joseph(+4 more)

Figure 1 for Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning

Figure 2 for Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning

Figure 3 for Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning

Figure 4 for Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning

Abstract:This study assesses the ability of state-of-the-art large language models (LLMs) including GPT-3.5, GPT-4, Falcon, and LLaMA 2 to identify patients with mild cognitive impairment (MCI) from discharge summaries and examines instances where the models' responses were misaligned with their reasoning. Utilizing the MIMIC-IV v2.2 database, we focused on a cohort aged 65 and older, verifying MCI diagnoses against ICD codes and expert evaluations. The data was partitioned into training, validation, and testing sets in a 7:2:1 ratio for model fine-tuning and evaluation, with an additional metastatic cancer dataset from MIMIC III used to further assess reasoning consistency. GPT-4 demonstrated superior interpretative capabilities, particularly in response to complex prompts, yet displayed notable response-reasoning inconsistencies. In contrast, open-source models like Falcon and LLaMA 2 achieved high accuracy but lacked explanatory reasoning, underscoring the necessity for further research to optimize both performance and interpretability. The study emphasizes the significance of prompt engineering and the need for further exploration into the unexpected reasoning-response misalignment observed in GPT-4. The results underscore the promise of incorporating LLMs into healthcare diagnostics, contingent upon methodological advancements to ensure accuracy and clinical coherence of AI-generated outputs, thereby improving the trustworthiness of LLMs for medical decision-making.

Via

Access Paper or Ask Questions