Abstract:In this paper, we depart from the widely-used gradient descent-based hierarchical federated learning (FL) algorithms to develop a novel hierarchical FL framework based on the alternating direction method of multipliers (ADMM). Within this framework, we propose two novel FL algorithms, which both use ADMM in the top layer: one that employs ADMM in the lower layer and another that uses the conventional gradient descent-based approach. The proposed framework enhances privacy, and experiments demonstrate the superiority of the proposed algorithms compared to the conventional algorithms in terms of learning convergence and accuracy. Additionally, gradient descent on the lower layer performs well even if the number of local steps is very limited, while ADMM on both layers lead to better performance otherwise.
Abstract:This paper introduces a new federated learning scheme that leverages over-the-air computation. A novel feature of this scheme is the proposal to employ adaptive weights during aggregation, a facet treated as predefined in other over-the-air schemes. This can mitigate the impact of wireless channel conditions on learning performance, without needing channel state information at transmitter side (CSIT). We provide a mathematical methodology to derive the convergence bound for the proposed scheme in the context of computational heterogeneity and general loss functions, supplemented with design insights. Accordingly, we propose aggregation cost metrics and efficient algorithms to find optimized weights for the aggregation. Finally, through numerical experiments, we validate the effectiveness of the proposed scheme. Even with the challenges posed by channel conditions and device heterogeneity, the proposed scheme surpasses other over-the-air strategies by an accuracy improvement of 15% over the scheme using CSIT and 30% compared to the one without CSIT.
Abstract:This paper introduces a federated learning framework that enables over-the-air computation via digital communications, using a new joint source-channel coding scheme. Without relying on channel state information at devices, this scheme employs lattice codes to both quantize model parameters and exploit interference from the devices. We propose a novel receiver structure at the server, designed to reliably decode an integer combination of the quantized model parameters as a lattice point for the purpose of aggregation. We present a mathematical approach to derive a convergence bound for the proposed scheme and offer design remarks. In this context, we suggest an aggregation metric and a corresponding algorithm to determine effective integer coefficients for the aggregation in each communication round. Our results illustrate that, regardless of channel dynamics and data heterogeneity, our scheme consistently delivers superior learning accuracy across various parameters and markedly surpasses other over-the-air methodologies.
Abstract:This paper presents a novel hierarchical federated learning algorithm within multiple sets that incorporates quantization for communication-efficiency and demonstrates resilience to statistical heterogeneity. Unlike conventional hierarchical federated learning algorithms, our approach combines gradient aggregation in intra-set iterations with model aggregation in inter-set iterations. We offer a comprehensive analytical framework to evaluate its optimality gap and convergence rate, comparing these aspects with those of conventional algorithms. Additionally, we develop a problem formulation to derive optimal system parameters in a closed-form solution. Our findings reveal that our algorithm consistently achieves high learning accuracy over a range of parameters and significantly outperforms other hierarchical algorithms, particularly in scenarios with heterogeneous data distributions.
Abstract:This paper introduces a universal federated learning framework that enables over-the-air computation via digital communications, using a new joint source-channel coding scheme. Without relying on channel state information at devices, this scheme employs lattice codes to both quantize model parameters and exploit interference from the devices. A novel two-layer receiver structure at the server is designed to reliably decode an integer combination of the quantized model parameters as a lattice point for the purpose of aggregation. Numerical experiments validate the effectiveness of the proposed scheme. Even with the challenges posed by channel conditions and device heterogeneity, the proposed scheme markedly surpasses other over-the-air FL strategies.
Abstract:When implementing hierarchical federated learning over wireless networks, scalability assurance and the ability to handle both interference and device data heterogeneity are crucial. This work introduces a learning method designed to address these challenges, along with a scalable transmission scheme that efficiently uses a single wireless resource through over-the-air computation. To provide resistance against data heterogeneity, we employ gradient aggregations. Meanwhile, the impact of interference is minimized through optimized receiver normalizing factors. For this, we model a multi-cluster wireless network using stochastic geometry, and characterize the mean squared error of the aggregation estimations as a function of the network parameters. We show that despite the interference and the data heterogeneity, the proposed scheme achieves high learning accuracy and can significantly outperform the conventional hierarchical algorithm.
Abstract:In this work, we propose a communication-efficient two-layer federated learning algorithm for distributed setups including a core server and multiple edge servers with clusters of devices. Assuming different learning tasks, clusters with a same task collaborate. To implement the algorithm over wireless links, we propose a scalable clustered over-the-air aggregation scheme for the uplink with a bandwidth-limited broadcast scheme for the downlink that requires only two single resource blocks for each algorithm iteration, independent of the number of edge servers and devices. This setup is faced with interference of devices in the uplink and interference of edge servers in the downlink that are to be modeled rigorously. We first develop a spatial model for the setup by modeling devices as a Poisson cluster process over the edge servers and quantify uplink and downlink error terms due to the interference. Accordingly, we present a comprehensive mathematical approach to derive the convergence bound for the proposed algorithm including any number of collaborating clusters in the setup and provide important special cases and design remarks. Finally, we show that despite the interference in the proposed uplink and downlink schemes, the proposed algorithm achieves high learning accuracy for a variety of parameters.