Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marco Canini

KAUST

Query-based Knowledge Transfer for Heterogeneous Learning Environments

Apr 12, 2025

Norah Alballa, Wenxuan Zhang, Ziquan Liu, Ahmed M. Abdelmoniem, Mohamed Elhoseiny, Marco Canini

Abstract:Decentralized collaborative learning under data heterogeneity and privacy constraints has rapidly advanced. However, existing solutions like federated learning, ensembles, and transfer learning, often fail to adequately serve the unique needs of clients, especially when local data representation is limited. To address this issue, we propose a novel framework called Query-based Knowledge Transfer (QKT) that enables tailored knowledge acquisition to fulfill specific client needs without direct data exchange. QKT employs a data-free masking strategy to facilitate communication-efficient query-focused knowledge transfer while refining task-specific parameters to mitigate knowledge interference and forgetting. Our experiments, conducted on both standard and clinical benchmarks, show that QKT significantly outperforms existing collaborative learning methods by an average of 20.91\% points in single-class query settings and an average of 14.32\% points in multi-class query scenarios. Further analysis and ablation studies reveal that QKT effectively balances the learning of new and existing knowledge, showing strong potential for its application in decentralized learning.

* Accepted at ICLR'25

Via

Access Paper or Ask Questions

Protecting Confidentiality, Privacy and Integrity in Collaborative Learning

Dec 11, 2024

Dong Chen, Alice Dethise, Istemi Ekin Akkus, Ivica Rimac, Klaus Satzke, Antti Koskela, Marco Canini, Wei Wang, Ruichuan Chen

Figure 1 for Protecting Confidentiality, Privacy and Integrity in Collaborative Learning

Figure 2 for Protecting Confidentiality, Privacy and Integrity in Collaborative Learning

Figure 3 for Protecting Confidentiality, Privacy and Integrity in Collaborative Learning

Figure 4 for Protecting Confidentiality, Privacy and Integrity in Collaborative Learning

Abstract:A collaboration between dataset owners and model owners is needed to facilitate effective machine learning (ML) training. During this collaboration, however, dataset owners and model owners want to protect the confidentiality of their respective assets (i.e., datasets, models and training code), with the dataset owners also caring about the privacy of individual users whose data is in their datasets. Existing solutions either provide limited confidentiality for models and training code, or suffer from privacy issues due to collusion. We present Citadel++, a scalable collaborative ML training system designed to simultaneously protect the confidentiality of datasets, models and training code, as well as the privacy of individual users. Citadel++ enhances differential privacy techniques to safeguard the privacy of individual user data while maintaining model utility. By employing Virtual Machine-level Trusted Execution Environments (TEEs) and improved integrity protection techniques through various OS-level mechanisms, Citadel++ effectively preserves the confidentiality of datasets, models and training code, and enforces our privacy mechanisms even when the models and training code have been maliciously designed. Our experiments show that Citadel++ provides privacy, model utility and performance while adhering to confidentiality and privacy requirements of dataset owners and model owners, outperforming the state-of-the-art privacy-preserving training systems by up to 543x on CPU and 113x on GPU TEEs.

Via

Access Paper or Ask Questions

ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Nov 19, 2024

Salma Kharrat, Fares Fourati, Marco Canini

Figure 1 for ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Figure 2 for ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Figure 3 for ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Figure 4 for ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Abstract:The effectiveness of Large Language Models (LLMs) in solving tasks vastly depends on the quality of the instructions, which often require fine-tuning through extensive human effort. This highlights the need for automated instruction optimization; however, this optimization is particularly challenging when dealing with black-box LLMs, where model parameters and gradients remain inaccessible. We propose ACING, a task-specific prompt optimization approach framed as a stateless continuous-action Reinforcement Learning (RL) problem, known as the continuum bandit setting. ACING leverages an actor-critic-based method to optimize prompts, learning from non-differentiable reward signals. We validate ACING by optimizing prompts for ChatGPT on 30 instruction-based tasks. ACING consistently outperforms baseline methods, achieving a median score improvement of 10 percentage points. Furthermore, ACING not only recovers but also surpasses human-crafted expert instructions, achieving up to a 39 percentage point improvement against human benchmarks.

Via

Access Paper or Ask Questions

Where is the Testbed for my Federated Learning Research?

Jul 19, 2024

Janez Božič, Amândio R. Faustino, Boris Radovič, Marco Canini, Veljko Pejović

Abstract:Progressing beyond centralized AI is of paramount importance, yet, distributed AI solutions, in particular various federated learning (FL) algorithms, are often not comprehensively assessed, which prevents the research community from identifying the most promising approaches and practitioners from being convinced that a certain solution is deployment-ready. The largest hurdle towards FL algorithm evaluation is the difficulty of conducting real-world experiments over a variety of FL client devices and different platforms, with different datasets and data distribution, all while assessing various dimensions of algorithm performance, such as inference accuracy, energy consumption, and time to convergence, to name a few. In this paper, we present CoLExT, a real-world testbed for FL research. CoLExT is designed to streamline experimentation with custom FL algorithms in a rich testbed configuration space, with a large number of heterogeneous edge devices, ranging from single-board computers to smartphones, and provides real-time collection and visualization of a variety of metrics through automatic instrumentation. According to our evaluation, porting FL algorithms to CoLExT requires minimal involvement from the developer, and the instrumentation introduces minimal resource usage overhead. Furthermore, through an initial investigation involving popular FL algorithms running on CoLExT, we reveal previously unknown trade-offs, inefficiencies, and programming bugs.

Via

Access Paper or Ask Questions

Towards a Flexible and High-Fidelity Approach to Distributed DNN Training Emulation

May 05, 2024

Banruo Liu, Mubarak Adetunji Ojewale, Yuhan Ding, Marco Canini

Abstract:We propose NeuronaBox, a flexible, user-friendly, and high-fidelity approach to emulate DNN training workloads. We argue that to accurately observe performance, it is possible to execute the training workload on a subset of real nodes and emulate the networked execution environment along with the collective communication operations. Initial results from a proof-of-concept implementation show that NeuronaBox replicates the behavior of actual systems with high accuracy, with an error margin of less than 1% between the emulated measurements and the real system.

Via

Access Paper or Ask Questions

Practical Insights into Knowledge Distillation for Pre-Trained Models

Feb 22, 2024

Norah Alballa, Marco Canini

Figure 1 for Practical Insights into Knowledge Distillation for Pre-Trained Models

Figure 2 for Practical Insights into Knowledge Distillation for Pre-Trained Models

Figure 3 for Practical Insights into Knowledge Distillation for Pre-Trained Models

Figure 4 for Practical Insights into Knowledge Distillation for Pre-Trained Models

Abstract:This research investigates the enhancement of knowledge distillation (KD) processes in pre-trained models, an emerging field in knowledge transfer with significant implications for distributed training and federated learning environments. These environments benefit from reduced communication demands and accommodate various model architectures. Despite the adoption of numerous KD approaches for transferring knowledge among pre-trained models, a comprehensive understanding of KD's application in these scenarios is lacking. Our study conducts an extensive comparison of multiple KD techniques, including standard KD, tuned KD (via optimized temperature and weight parameters), deep mutual learning, and data partitioning KD. We assess these methods across various data distribution strategies to identify the most effective contexts for each. Through detailed examination of hyperparameter tuning, informed by extensive grid search evaluations, we pinpoint when adjustments are crucial to enhance model performance. This paper sheds light on optimal hyperparameter settings for distinct data partitioning scenarios and investigates KD's role in improving federated learning by minimizing communication rounds and expediting the training process. By filling a notable void in current research, our findings serve as a practical framework for leveraging KD in pre-trained models within collaborative and federated learning frameworks.

Via

Access Paper or Ask Questions

Flashback: Understanding and Mitigating Forgetting in Federated Learning

Feb 08, 2024

Mohammed Aljahdali, Ahmed M. Abdelmoniem, Marco Canini, Samuel Horváth

Abstract:In Federated Learning (FL), forgetting, or the loss of knowledge across rounds, hampers algorithm convergence, particularly in the presence of severe data heterogeneity among clients. This study explores the nuances of this issue, emphasizing the critical role of forgetting in FL's inefficient learning within heterogeneous data contexts. Knowledge loss occurs in both client-local updates and server-side aggregation steps; addressing one without the other fails to mitigate forgetting. We introduce a metric to measure forgetting granularly, ensuring distinct recognition amid new knowledge acquisition. Leveraging these insights, we propose Flashback, an FL algorithm with a dynamic distillation approach that is used to regularize the local models, and effectively aggregate their knowledge. Across different benchmarks, Flashback outperforms other methods, mitigates forgetting, and achieves faster round-to-target-accuracy, by converging in 6 to 16 rounds.

Via

Access Paper or Ask Questions

Kimad: Adaptive Gradient Compression with Bandwidth Awareness

Dec 13, 2023

Jihao Xin, Ivan Ilin, Shunkang Zhang, Marco Canini, Peter Richtárik

Figure 1 for Kimad: Adaptive Gradient Compression with Bandwidth Awareness

Figure 2 for Kimad: Adaptive Gradient Compression with Bandwidth Awareness

Figure 3 for Kimad: Adaptive Gradient Compression with Bandwidth Awareness

Figure 4 for Kimad: Adaptive Gradient Compression with Bandwidth Awareness

Abstract:In distributed training, communication often emerges as a bottleneck. In response, we introduce Kimad, a solution that offers adaptive gradient compression. By consistently monitoring bandwidth, Kimad refines compression ratios to match specific neural network layer requirements. Our exhaustive tests and proofs confirm Kimad's outstanding performance, establishing it as a benchmark in adaptive compression for distributed deep learning.

Via

Access Paper or Ask Questions

Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

May 29, 2023

Jihao Xin, Marco Canini, Peter Richtárik, Samuel Horváth

Figure 1 for Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

Figure 2 for Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

Figure 3 for Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

Figure 4 for Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

Abstract:Efficient distributed training is a principal driver of recent advances in deep learning. However, communication often proves costly and becomes the primary bottleneck in these systems. As a result, there is a demand for the design of efficient communication mechanisms that can empirically boost throughput while providing theoretical guarantees. In this work, we introduce Global-QSGD, a novel family of quantization operators, engineered to accelerate distributed training based on global scaling. We demonstrate that Global-QSGD is the first theoretically rigorous Allreduce-compatible compression mechanism that achieves a provable speed-up by striking a balance between compression error and communication savings. Importantly, Global-QSGD does not rely on costly error feedback due to its inherent unbiasedness and offers up to $O(\sqrt{n})$ additional compression ratio compared to the popular QSGD quantization ($n$ represents the number of workers). To obtain theoretical guarantees, we generalize the notion of standard unbiased compression operators to incorporate Global-QSGD. We show that this wider class permits standard analysis for unbiased compressors and thus ensures convergence for popular optimization algorithms (e.g., distributed SGD) under typical settings. For the empirical component of our work, we carry out a performance modeling analysis to determine if Global-QSGD can enhance training throughput under specific hardware configurations. We also conduct extensive empirical evaluations on various tasks, testing our theory on both NVLink and PCIe connections as well as a large-scale cloud system.

Via

Access Paper or Ask Questions

FilFL: Accelerating Federated Learning via Client Filtering

Feb 13, 2023

Fares Fourati, Salma Kharrat, Vaneet Aggarwal, Mohamed-Slim Alouini, Marco Canini

Abstract:Federated learning is an emerging machine learning paradigm that enables devices to train collaboratively without exchanging their local data. The clients participating in the training process are a random subset selected from the pool of clients. The above procedure is called client selection which is an important area in federated learning as it highly impacts the convergence rate, learning efficiency, and generalization. In this work, we introduce client filtering in federated learning (FilFL), a new approach to optimize client selection and training. FilFL first filters the active clients by choosing a subset of them that maximizes a specific objective function; then, a client selection method is applied to that subset. We provide a thorough analysis of its convergence in a heterogeneous setting. Empirical results demonstrate several benefits to our approach, including improved learning efficiency, accelerated convergence, $2$-$3\times$ faster, and higher test accuracy, around $2$-$10$ percentage points higher.

Via

Access Paper or Ask Questions