Abstract:Federated learning (FL) has emerged as a promising paradigm for fine-tuning foundation models using distributed data in a privacy-preserving manner. Under limited computational resources, clients often find it more practical to fine-tune a selected subset of layers, rather than the entire model, based on their task-specific data. In this study, we provide a thorough theoretical exploration of selective layer fine-tuning in FL, emphasizing a flexible approach that allows the clients to adjust their selected layers according to their local data and resources. We theoretically demonstrate that the layer selection strategy has a significant impact on model convergence in two critical aspects: the importance of selected layers and the heterogeneous choices across clients. Drawing from these insights, we further propose a strategic layer selection method that utilizes local gradients and regulates layer selections across clients. The extensive experiments on both image and text datasets demonstrate the effectiveness of the proposed strategy compared with several baselines, highlighting its advances in identifying critical layers that adapt to the client heterogeneity and training dynamics in FL.
Abstract:We consider the distributed learning problem with data dispersed across multiple workers under the orchestration of a central server. Asynchronous Stochastic Gradient Descent (SGD) has been widely explored in such a setting to reduce the synchronization overhead associated with parallelization. However, the performance of asynchronous SGD algorithms often depends on a bounded dissimilarity condition among the workers' local data, a condition that can drastically affect their efficiency when the workers' data are highly heterogeneous. To overcome this limitation, we introduce the \textit{dual-delayed asynchronous SGD (DuDe-ASGD)} algorithm designed to neutralize the adverse effects of data heterogeneity. DuDe-ASGD makes full use of stale stochastic gradients from all workers during asynchronous training, leading to two distinct time lags in the model parameters and data samples utilized in the server's iterations. Furthermore, by adopting an incremental aggregation strategy, DuDe-ASGD maintains a per-iteration computational cost that is on par with traditional asynchronous SGD algorithms. Our analysis demonstrates that DuDe-ASGD achieves a near-minimax-optimal convergence rate for smooth nonconvex problems, even when the data across workers are extremely heterogeneous. Numerical experiments indicate that DuDe-ASGD compares favorably with existing asynchronous and synchronous SGD-based algorithms.
Abstract:Federated learning (FL) has attracted vivid attention as a privacy-preserving distributed learning framework. In this work, we focus on cross-silo FL, where clients become the model owners after training and are only concerned about the model's generalization performance on their local data. Due to the data heterogeneity issue, asking all the clients to join a single FL training process may result in model performance degradation. To investigate the effectiveness of collaboration, we first derive a generalization bound for each client when collaborating with others or when training independently. We show that the generalization performance of a client can be improved only by collaborating with other clients that have more training data and similar data distribution. Our analysis allows us to formulate a client utility maximization problem by partitioning clients into multiple collaborating groups. A hierarchical clustering-based collaborative training (HCCT) scheme is then proposed, which does not need to fix in advance the number of groups. We further analyze the convergence of HCCT for general non-convex loss functions which unveils the effect of data similarity among clients. Extensive simulations show that HCCT achieves better generalization performance than baseline schemes, whereas it degenerates to independent training and conventional FL in specific scenarios.
Abstract:Federated learning (FL) has emerged as a privacy-preserving paradigm that trains neural networks on edge devices without collecting data at a central server. However, FL encounters an inherent challenge in dealing with non-independent and identically distributed (non-IID) data among devices. To address this challenge, this paper proposes a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models. Specifically, synthetic data are generated by learning the essential class-relevant features of real samples and discarding the redundant features, which helps to effectively tackle the non-IID issue. For better privacy preservation, we propose a hard feature augmentation method to transfer real features towards the decision boundary, with which the synthetic data not only improve the model generalization but also erase the information of real features. By integrating the proposed HFMDS method with FL, we present a novel FL framework with data augmentation to relieve data heterogeneity. The theoretical analysis highlights the effectiveness of our proposed data synthesis method in solving the non-IID challenge. Simulation results further demonstrate that our proposed HFMDS-FL algorithm outperforms the baselines in terms of accuracy, privacy preservation, and computational cost on various benchmark datasets.
Abstract:Federated learning (FL) has emerged as a highly effective paradigm for privacy-preserving collaborative training among different parties. Unlike traditional centralized learning, which requires collecting data from each party, FL allows clients to share privacy-preserving information without exposing private datasets. This approach not only guarantees enhanced privacy protection but also facilitates more efficient and secure collaboration among multiple participants. Therefore, FL has gained considerable attention from researchers, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on methods sharing model parameters during the training process, while overlooking the potential of sharing other forms of local information. In this paper, we present a systematic survey from a new perspective, i.e., what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. This survey differs from previous ones due to four distinct contributions. First, we present a new taxonomy of FL methods in terms of the sharing methods, which includes three categories of shared information: model sharing, synthetic data sharing, and knowledge sharing. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms that provide certain privacy guarantees. Third, we conduct extensive experiments to compare the performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we discuss potential deficiencies in current methods and outline future directions for improvement.
Abstract:Federated learning (FL) is a promising framework for privacy-preserving collaborative learning, where model training tasks are distributed to clients and only the model updates need to be collected at a server. However, when being deployed at mobile edge networks, clients may have unpredictable availability and drop out of the training process, which hinders the convergence of FL. This paper tackles such a critical challenge. Specifically, we first investigate the convergence of the classical FedAvg algorithm with arbitrary client dropouts. We find that with the common choice of a decaying learning rate, FedAvg oscillates around a stationary point of the global loss function, which is caused by the divergence between the aggregated and desired central update. Motivated by this new observation, we then design a novel training algorithm named MimiC, where the server modifies each received model update based on the previous ones. The proposed modification of the received model updates mimics the imaginary central update irrespective of dropout clients. The theoretical analysis of MimiC shows that divergence between the aggregated and central update diminishes with proper learning rates, leading to its convergence. Simulation results further demonstrate that MimiC maintains stable convergence performance and learns better models than the baseline methods.
Abstract:Federated learning (FL) is a popular privacy-preserving distributed training scheme, where multiple devices collaborate to train machine learning models by uploading local model updates. To improve communication efficiency, over-the-air computation (AirComp) has been applied to FL, which leverages analog modulation to harness the superposition property of radio waves such that numerous devices can upload their model updates concurrently for aggregation. However, the uplink channel noise incurs considerable model aggregation distortion, which is critically determined by the device scheduling and compromises the learned model performance. In this paper, we propose a probabilistic device scheduling framework for over-the-air FL, named PO-FL, to mitigate the negative impact of channel noise, where each device is scheduled according to a certain probability and its model update is reweighted using this probability in aggregation. We prove the unbiasedness of this aggregation scheme and demonstrate the convergence of PO-FL on both convex and non-convex loss functions. Our convergence bounds unveil that the device scheduling affects the learning performance through the communication distortion and global update variance. Based on the convergence analysis, we further develop a channel and gradient-importance aware algorithm to optimize the device scheduling probabilities in PO-FL. Extensive simulation results show that the proposed PO-FL framework with channel and gradient-importance awareness achieves faster convergence and produces better models than baseline methods.
Abstract:Federated learning (FL) attempts to train a global model by aggregating local models from distributed devices under the coordination of a central server. However, the existence of a large number of heterogeneous devices makes FL vulnerable to various attacks, especially the stealthy backdoor attack. Backdoor attack aims to trick a neural network to misclassify data to a target label by injecting specific triggers while keeping correct predictions on original training data. Existing works focus on client-side attacks which try to poison the global model by modifying the local datasets. In this work, we propose a new attack model for FL, namely Data-Agnostic Backdoor attack at the Server (DABS), where the server directly modifies the global model to backdoor an FL system. Extensive simulation results show that this attack scheme achieves a higher attack success rate compared with baseline methods while maintaining normal accuracy on the clean data.
Abstract:Federated learning (FL) has achieved great success as a privacy-preserving distributed training paradigm, where many edge devices collaboratively train a machine learning model by sharing the model updates instead of the raw data with a server. However, the heterogeneous computational and communication resources of edge devices give rise to stragglers that significantly decelerate the training process. To mitigate this issue, we propose a novel FL framework named stochastic coded federated learning (SCFL) that leverages coded computing techniques. In SCFL, before the training process starts, each edge device uploads a privacy-preserving coded dataset to the server, which is generated by adding Gaussian noise to the projected local dataset. During training, the server computes gradients on the global coded dataset to compensate for the missing model updates of the straggling devices. We design a gradient aggregation scheme to ensure that the aggregated model update is an unbiased estimate of the desired global update. Moreover, this aggregation scheme enables periodical model averaging to improve the training efficiency. We characterize the tradeoff between the convergence performance and privacy guarantee of SCFL. In particular, a more noisy coded dataset provides stronger privacy protection for edge devices but results in learning performance degradation. We further develop a contract-based incentive mechanism to coordinate such a conflict. The simulation results show that SCFL learns a better model within the given time and achieves a better privacy-performance tradeoff than the baseline methods. In addition, the proposed incentive mechanism grants better training performance than the conventional Stackelberg game approach.
Abstract:Federated learning (FL) strives to enable collaborative training of machine learning models without centrally collecting clients' private data. Different from centralized training, the local datasets across clients in FL are non-independent and identically distributed (non-IID). In addition, the data-owning clients may drop out of the training process arbitrarily. These characteristics will significantly degrade the training performance. This paper proposes a Dropout-Resilient Secure Federated Learning (DReS-FL) framework based on Lagrange coded computing (LCC) to tackle both the non-IID and dropout problems. The key idea is to utilize Lagrange coding to secretly share the private datasets among clients so that each client receives an encoded version of the global dataset, and the local gradient computation over this dataset is unbiased. To correctly decode the gradient at the server, the gradient function has to be a polynomial in a finite field, and thus we construct polynomial integer neural networks (PINNs) to enable our framework. Theoretical analysis shows that DReS-FL is resilient to client dropouts and provides privacy protection for the local datasets. Furthermore, we experimentally demonstrate that DReS-FL consistently leads to significant performance gains over baseline methods.