Abstract:Over-the-air federated learning (OTA-FL) is an emerging technique to reduce the computation and communication overload at the PS caused by the orthogonal transmissions of the model updates in conventional federated learning (FL). This reduction is achieved at the expense of introducing aggregation error that can be efficiently suppressed by means of receive beamforming via large array-antennas. This paper studies OTA-FL in massive multiple-input multiple-output (MIMO) systems by considering a realistic scenario in which the edge server, despite its large antenna array, is restricted in the number of radio frequency (RF)-chains. For this setting, the beamforming for over-the-air model aggregation needs to be addressed jointly with antenna selection. This leads to an NP-hard problem due to the combinatorial nature of the optimization. We tackle this problem via two different approaches. In the first approach, we use the penalty dual decomposition (PDD) technique to develop a two-tier algorithm for joint antenna selection and beamforming. The second approach interprets the antenna selection task as a sparse recovery problem and develops two iterative joint algorithms based on the Lasso and fast iterative soft-thresholding methods. Convergence and complexity analysis is presented for all the schemes. The numerical investigations depict that the algorithms based on the sparse recovery techniques outperform the PDD-based algorithm, when the number of RF-chains at the edge server is much smaller than its array size. However, as the number of RF-chains increases, the PDD approach starts to be superior. Our simulations further depict that learning performance with all the antennas being active at the PS can be closely tracked by selecting less than 20% of the antennas at the PS.
Abstract:This work studies a low-complexity design for reconfigurable intelligent surface (RIS)-aided multiuser multiple-input multiple-output systems. The base station (BS) applies receive antenna selection to connect a subset of its antennas to the available radio frequency chains. For this setting, the BS switching network, uplink precoders, and RIS phase-shifts are jointly designed, such that the uplink sum-rate is maximized. The principle design problem reduces to an NP-hard mixed-integer optimization. We hence invoke the weighted minimum mean squared error technique and the penalty dual decomposition method to develop a tractable iterative algorithm that approximates the optimal design effectively. Our numerical investigations verify the efficiency of the proposed algorithm and its superior performance as compared with the benchmark.
Abstract:Classical antenna selection schemes require instantaneous channel state information (CSI). This leads to high signaling overhead in the system. This work proposes a novel joint receive antenna selection and precoding scheme for multiuser multiple-input multiple-output uplink transmission that relies only on the long-term statistics of the CSI. The proposed scheme designs the switching network and the uplink precoders, such that the expected throughput of the system in the long term is maximized. Invoking results from the random matrix theory, we derive a closed-form expression for the expected throughput of the system. We then develop a tractable iterative algorithm to tackle the throughput maximization problem, capitalizing on the alternating optimization and majorization-maximization (MM) techniques. Numerical results substantiate the efficiency of the proposed approach and its superior performance as compared with the baseline.
Abstract:This paper develops a class of low-complexity device scheduling algorithms for over-the-air federated learning via the method of matching pursuit. The proposed scheme tracks closely the close-to-optimal performance achieved by difference-of-convex programming, and outperforms significantly the well-known benchmark algorithms based on convex relaxation. Compared to the state-of-the-art, the proposed scheme poses a drastically lower computational load on the system: For $K$ devices and $N$ antennas at the parameter server, the benchmark complexity scales with $\left(N^2+K\right)^3 + N^6$ while the complexity of the proposed scheme scales with $K^p N^q$ for some $0 < p,q \leq 2$. The efficiency of the proposed scheme is confirmed via numerical experiments on the CIFAR-10 dataset.