Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haobo Zhang

Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging

May 28, 2025

Haobo Zhang, Jiayu Zhou

Abstract:Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage. Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training. However, existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation. In this paper, we show that this issue arises from a previously overlooked interplay between model parameters and data distributions. We propose Orthogonal Subspaces for Robust model Merging (OSRM) to constrain the LoRA subspace *prior* to fine-tuning, ensuring that updates relevant to one task do not adversely shift outputs for others. Our approach can seamlessly integrate with most existing merging algorithms, reducing the unintended interference among tasks. Extensive experiments on eight datasets, tested with three widely used LMs and two large LMs, demonstrate that our method not only boosts merging performance but also preserves single-task accuracy. Furthermore, our approach exhibits greater robustness to the hyperparameters of merging. These results highlight the importance of data-parameter interaction in model merging and offer a plug-and-play solution for merging LoRA models.

* 14 pages, 5 figures, 16 tables, accepted by ACL 2025

Via

Access Paper or Ask Questions

Towards a Statistical Understanding of Neural Networks: Beyond the Neural Tangent Kernel Theories

Dec 25, 2024

Haobo Zhang, Jianfa Lai, Yicheng Li, Qian Lin, Jun S. Liu

Abstract:A primary advantage of neural networks lies in their feature learning characteristics, which is challenging to theoretically analyze due to the complexity of their training dynamics. We propose a new paradigm for studying feature learning and the resulting benefits in generalizability. After reviewing the neural tangent kernel (NTK) theory and recent results in kernel regression, which address the generalization issue of sufficiently wide neural networks, we examine limitations and implications of the fixed kernel theory (as the NTK theory) and review recent theoretical advancements in feature learning. Moving beyond the fixed kernel/feature theory, we consider neural networks as adaptive feature models. Finally, we propose an over-parameterized Gaussian sequence model as a prototype model to study the feature learning characteristics of neural networks.

Via

Access Paper or Ask Questions

On the Pinsker bound of inner product kernel regression in large dimensions

Sep 02, 2024

Weihao Lu, Jialin Ding, Haobo Zhang, Qian Lin

Abstract:Building on recent studies of large-dimensional kernel regression, particularly those involving inner product kernels on the sphere $\mathbb{S}^{d}$, we investigate the Pinsker bound for inner product kernel regression in such settings. Specifically, we address the scenario where the sample size $n$ is given by $\alpha d^{\gamma}(1+o_{d}(1))$ for some $\alpha, \gamma>0$. We have determined the exact minimax risk for kernel regression in this setting, not only identifying the minimax rate but also the exact constant, known as the Pinsker constant, associated with the excess risk.

Via

Access Paper or Ask Questions

On the Saturation Effect of Kernel Ridge Regression

May 15, 2024

Yicheng Li, Haobo Zhang, Qian Lin

Figure 1 for On the Saturation Effect of Kernel Ridge Regression

Figure 2 for On the Saturation Effect of Kernel Ridge Regression

Figure 3 for On the Saturation Effect of Kernel Ridge Regression

Figure 4 for On the Saturation Effect of Kernel Ridge Regression

Abstract:The saturation effect refers to the phenomenon that the kernel ridge regression (KRR) fails to achieve the information theoretical lower bound when the smoothness of the underground truth function exceeds certain level. The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this long-standing conjecture.

* ICLR 2023; Minor errors are corrected in this version

Via

Access Paper or Ask Questions

The phase diagram of kernel interpolation in large dimensions

Apr 19, 2024

Haobo Zhang, Weihao Lu, Qian Lin

Abstract:The generalization ability of kernel interpolation in large dimensions (i.e., $n \asymp d^{\gamma}$ for some $\gamma>0$) might be one of the most interesting problems in the recent renaissance of kernel regression, since it may help us understand the 'benign overfitting phenomenon' reported in the neural networks literature. Focusing on the inner product kernel on the sphere, we fully characterized the exact order of both the variance and bias of large-dimensional kernel interpolation under various source conditions $s\geq 0$. Consequently, we obtained the $(s,\gamma)$-phase diagram of large-dimensional kernel interpolation, i.e., we determined the regions in $(s,\gamma)$-plane where the kernel interpolation is minimax optimal, sub-optimal and inconsistent.

* 18 pages, 1 figure

Via

Access Paper or Ask Questions

Multi-target Detection for Reconfigurable Holographic Surfaces Enabled Radar

Jan 17, 2024

Xiaoyu Zhang, Haobo Zhang, Ruoqi Deng, Liang Liu, Boya Di

Abstract:Multi-target detection is one of the primary tasks in radar-based localization and sensing, typically built on phased array antennas. However, the bulky hardware in the phased array restricts its potential for enhancing detection accuracy, since the cost and power of the phased array can become unaffordable as its physical aperture scales up to pursue higher beam shaping capabilities. To resolve this issue, we propose a radar system enabled by reconfigurable holographic surfaces (RHSs), a novel meta-surface antenna composed of meta-material elements with cost-effective and power-efficient hardware, which performs multi-target detection in an adaptive manner. Different from the phase-control structure in the phased array, the RHS is able to apply beamforming by controlling the radiation amplitudes of its elements. Consequently, traditional beamforming schemes designed for phased arrays cannot be directly applied to RHSs due to this structural difference. To tackle this challenge, a waveform and amplitude optimization algorithm (WAOA) is designed to jointly optimize the radar waveform and RHS amplitudes in order to improve the detection accuracy. Simulation results reveal that the proposed RHS-enabled radar increases the probability of detection by 0.13 compared to phased array radars when six iterations of adaptive detection are performed given the same hardware cost.

Via

Access Paper or Ask Questions

Reconfigurable Holographic Surface Aided Wireless Simultaneous Localization and Mapping

Jan 16, 2024

Haobo Zhang, Ziang Yang, Hongliang Zhang, Boya Di, Lingyang Song

Abstract:As a crucial facilitator of future autonomous driving applications, wireless simultaneous localization and mapping (SLAM) has drawn growing attention recently. However, the accuracy of existing wireless SLAM schemes is limited because the antenna gain is constrained given the cost budget due to the expensive hardware components such as phase arrays. To address this issue, we propose a reconfigurable holographic surface (RHS)-aided SLAM system in this paper. The RHS is a novel type of low-cost antenna that can cut down the hardware cost by replacing phased arrays in conventional SLAM systems. However, compared with a phased array where the phase shifts of parallelfed signals are adjusted, the RHS exhibits a different radiation model because its amplitude-controlled radiation elements are series-fed by surface waves, implying that traditional schemes cannot be applied directly. To address this challenge, we propose an RHS-aided beam steering method for sensing the surrounding environment and design the corresponding SLAM algorithm. Simulation results show that the proposed scheme can achieve more than there times the localization accuracy that traditional wireless SLAM with the same cost achieves.

Via

Access Paper or Ask Questions

Unified Near-field and Far-field Localization with Holographic MIMO

Jan 12, 2024

Mengyuan Cao, Haobo Zhang, Boya Di, Hongliang Zhang

Figure 1 for Unified Near-field and Far-field Localization with Holographic MIMO

Figure 2 for Unified Near-field and Far-field Localization with Holographic MIMO

Figure 3 for Unified Near-field and Far-field Localization with Holographic MIMO

Figure 4 for Unified Near-field and Far-field Localization with Holographic MIMO

Abstract:Localization which uses holographic multiple input multiple output surface such as reconfigurable intelligent surface (RIS) has gained increasing attention due to its ability to accurately localize users in non-line-of-sight conditions. However, existing RIS-enabled localization methods assume the users at either the near-field (NF) or the far-field (FF) region, which results in high complexity or low localization accuracy, respectively, when they are applied in the whole area. In this paper, a unified NF and FF localization method is proposed for the RIS-enabled localization system to overcome the above issue. Specifically, the NF and FF regions are both divided into grids. The RIS reflects the signals from the user to the base station~(BS), and then the BS uses the received signals to determine the grid where the user is located. Compared with existing NF- or FF-only schemes, the design of the location estimation method and the RIS phase shift optimization algorithm is more challenging because they are based on a hybrid NF and FF model. To tackle these challenges, we formulate the optimization problems for location estimation and RIS phase shifts, and design two algorithms to effectively solve the formulated problems, respectively. The effectiveness of the proposed method is verified through simulations.

Via

Access Paper or Ask Questions

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

Jan 02, 2024

Haobo Zhang, Yicheng Li, Weihao Lu, Qian Lin

Abstract:Motivated by the studies of neural networks (e.g.,the neural tangent kernel theory), we perform a study on the large-dimensional behavior of kernel ridge regression (KRR) where the sample size $n \asymp d^{\gamma}$ for some $\gamma > 0$. Given an RKHS $\mathcal{H}$ associated with an inner product kernel defined on the sphere $\mathbb{S}^{d}$, we suppose that the true function $f_{\rho}^{*} \in [\mathcal{H}]^{s}$, the interpolation space of $\mathcal{H}$ with source condition $s>0$. We first determined the exact order (both upper and lower bound) of the generalization error of kernel ridge regression for the optimally chosen regularization parameter $\lambda$. We then further showed that when $0<s\le1$, KRR is minimax optimal; and when $s>1$, KRR is not minimax optimal (a.k.a. he saturation effect). Our results illustrate that the curves of rate varying along $\gamma$ exhibit the periodic plateau behavior and the multiple descent behavior and show how the curves evolve with $s>0$. Interestingly, our work provides a unified viewpoint of several recent works on kernel regression in the large-dimensional setting, which correspond to $s=0$ and $s=1$ respectively.

* 61 pages, 11 figures

Via

Access Paper or Ask Questions

On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay

Sep 23, 2023

Yicheng Li, Haobo Zhang, Qian Lin

Abstract:The widely observed 'benign overfitting phenomenon' in the neural network literature raises the challenge to the 'bias-variance trade-off' doctrine in the statistical learning theory. Since the generalization ability of the 'lazy trained' over-parametrized neural network can be well approximated by that of the neural tangent kernel regression, the curve of the excess risk (namely, the learning curve) of kernel ridge regression attracts increasing attention recently. However, most recent arguments on the learning curve are heuristic and are based on the 'Gaussian design' assumption. In this paper, under mild and more realistic assumptions, we rigorously provide a full characterization of the learning curve: elaborating the effect and the interplay of the choice of the regularization parameter, the source condition and the noise. In particular, our results suggest that the 'benign overfitting phenomenon' exists in very wide neural networks only when the noise level is small.

Via

Access Paper or Ask Questions