Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fady Rezk

Model Diffusion for Certifiable Few-shot Transfer Learning

Feb 10, 2025

Fady Rezk, Royson Lee, Henry Gouk, Timothy Hospedales, Minyoung Kim

Abstract:In modern large-scale deep learning, a prevalent and effective workflow for solving low-data problems is adapting powerful pre-trained foundation models (FMs) to new tasks via parameter-efficient fine-tuning (PEFT). However, while empirically effective, the resulting solutions lack generalisation guarantees to certify their accuracy - which may be required for ethical or legal reasons prior to deployment in high-importance applications. In this paper we develop a novel transfer learning approach that is designed to facilitate non-vacuous learning theoretic generalisation guarantees for downstream tasks, even in the low-shot regime. Specifically, we first use upstream tasks to train a distribution over PEFT parameters. We then learn the downstream task by a sample-and-evaluate procedure -- sampling plausible PEFTs from the trained diffusion model and selecting the one with the highest likelihood on the downstream data. Crucially, this confines our model hypothesis to a finite set of PEFT samples. In contrast to learning in the typical continuous hypothesis spaces of neural network weights, this facilitates tighter risk certificates. We instantiate our bound and show non-trivial generalization guarantees compared to existing learning approaches which lead to vacuous bounds in the low-shot regime.

Via

Access Paper or Ask Questions

FedP$^2$EFT: Federated Learning to Personalize Parameter Efficient Fine-Tuning for Multilingual LLMs

Feb 05, 2025

Royson Lee, Minyoung Kim, Fady Rezk, Rui Li, Stylianos I. Venieris, Timothy Hospedales

Figure 1 for FedP$^2$EFT: Federated Learning to Personalize Parameter Efficient Fine-Tuning for Multilingual LLMs

Figure 2 for FedP$^2$EFT: Federated Learning to Personalize Parameter Efficient Fine-Tuning for Multilingual LLMs

Figure 3 for FedP$^2$EFT: Federated Learning to Personalize Parameter Efficient Fine-Tuning for Multilingual LLMs

Figure 4 for FedP$^2$EFT: Federated Learning to Personalize Parameter Efficient Fine-Tuning for Multilingual LLMs

Abstract:Federated learning (FL) has enabled the training of multilingual large language models (LLMs) on diverse and decentralized multilingual data, especially on low-resource languages. To improve client-specific performance, personalization via the use of parameter-efficient fine-tuning (PEFT) modules such as LoRA is common. This involves a personalization strategy (PS), such as the design of the PEFT adapter structures (e.g., in which layers to add LoRAs and what ranks) and choice of hyperparameters (e.g., learning rates) for fine-tuning. Instead of manual PS configuration, we propose FedP$^2$EFT, a federated learning-to-personalize method for multilingual LLMs in cross-device FL settings. Unlike most existing PEFT structure selection methods, which are prone to overfitting low-data regimes, FedP$^2$EFT collaboratively learns the optimal personalized PEFT structure for each client via Bayesian sparse rank selection. Evaluations on both simulated and real-world multilingual FL benchmarks demonstrate that FedP$^2$EFT largely outperforms existing personalized fine-tuning methods, while complementing a range of existing FL methods.

* Preprint

Via

Access Paper or Ask Questions

On Training Implicit Meta-Learning With Applications to Inductive Weighing in Consistency Regularization

Oct 28, 2023

Fady Rezk

Figure 1 for On Training Implicit Meta-Learning With Applications to Inductive Weighing in Consistency Regularization

Figure 2 for On Training Implicit Meta-Learning With Applications to Inductive Weighing in Consistency Regularization

Figure 3 for On Training Implicit Meta-Learning With Applications to Inductive Weighing in Consistency Regularization

Figure 4 for On Training Implicit Meta-Learning With Applications to Inductive Weighing in Consistency Regularization

Abstract:Meta-learning that uses implicit gradient have provided an exciting alternative to standard techniques which depend on the trajectory of the inner loop training. Implicit meta-learning (IML), however, require computing $2^{nd}$ order gradients, particularly the Hessian which is impractical to compute for modern deep learning models. Various approximations for the Hessian were proposed but a systematic comparison of their compute cost, stability, generalization of solution found and estimation accuracy were largely overlooked. In this study, we start by conducting a systematic comparative analysis of the various approximation methods and their effect when incorporated into IML training routines. We establish situations where catastrophic forgetting is exhibited in IML and explain their cause in terms of the inability of the approximations to estimate the curvature at convergence points. Sources of IML training instability are demonstrated and remedied. A detailed analysis of the effeciency of various inverse Hessian-vector product approximation methods is also provided. Subsequently, we use the insights gained to propose and evaluate a novel semi-supervised learning algorithm that learns to inductively weigh consistency regularization losses. We show how training a "Confidence Network" to extract domain specific features can learn to up-weigh useful images and down-weigh out-of-distribution samples. Results outperform the baseline FixMatch performance.

Via

Access Paper or Ask Questions

Is Scaling Learned Optimizers Worth It? Evaluating The Value of VeLO's 4000 TPU Months

Oct 27, 2023

Fady Rezk, Antreas Antoniou, Henry Gouk, Timothy Hospedales

Abstract:We analyze VeLO (versatile learned optimizer), the largest scale attempt to train a general purpose "foundational" optimizer to date. VeLO was trained on thousands of machine learning tasks using over 4000 TPU months with the goal of producing an optimizer capable of generalizing to new problems while being hyperparameter free, and outperforming industry standards such as Adam. We independently evaluate VeLO on the MLCommons optimizer benchmark suite. We find that, contrary to initial claims: (1) VeLO has a critical hyperparameter that needs problem-specific tuning, (2) VeLO does not necessarily outperform competitors in quality of solution found, and (3) VeLO is not faster than competing optimizers at reducing the training loss. These observations call into question VeLO's generality and the value of the investment in training it.

Via

Access Paper or Ask Questions