Abstract:Second-order methods are widely adopted to improve the convergence rate of learning algorithms. In federated learning (FL), these methods require the clients to share their local Hessian matrices with the parameter server (PS), which comes at a prohibitive communication cost. A classical solution to this issue is to approximate the global Hessian matrix from the first-order information. Unlike in idealized networks, this solution does not perform effectively in over-the-air FL settings, where the PS receives noisy versions of the local gradients. This paper introduces a novel second-order FL framework tailored for wireless channels. The pivotal innovation lies in the PS's capability to directly estimate the global Hessian matrix from the received noisy local gradients via a non-parametric method: the PS models the unknown Hessian matrix as a Gaussian process, and then uses the temporal relation between the gradients and Hessian along with the channel model to find a stochastic estimator for the global Hessian matrix. We refer to this method as Gaussian process-based Hessian modeling for wireless FL (GP-FL) and show that it exhibits a linear-quadratic convergence rate. Numerical experiments on various datasets demonstrate that GP-FL outperforms all classical baseline first and second order FL approaches.
Abstract:Quantization is a common approach to mitigate the communication cost of federated learning (FL). In practice, the quantized local parameters are further encoded via an entropy coding technique, such as Huffman coding, for efficient data compression. In this case, the exact communication overhead is determined by the bit rate of the encoded gradients. Recognizing this fact, this work deviates from the existing approaches in the literature and develops a novel quantized FL framework, called \textbf{r}ate-\textbf{c}onstrained \textbf{fed}erated learning (RC-FED), in which the gradients are quantized subject to both fidelity and data rate constraints. We formulate this scheme, as a joint optimization in which the quantization distortion is minimized while the rate of encoded gradients is kept below a target threshold. This enables for a tunable trade-off between quantization distortion and communication cost. We analyze the convergence behavior of RC-FED, and show its superior performance against baseline quantized FL schemes on several datasets.
Abstract:We revisit existing linear computation coding (LCC) algorithms, and introduce a new framework that measures the computational cost of computing multidimensional linear functions, not only in terms of the number of additions, but also with respect to their suitability for parallel processing. Utilizing directed acyclic graphs, which correspond to signal flow graphs in hardware, we propose a novel LCC algorithm that controls the trade-off between the total number of operations and their parallel executability. Numerical evaluations show that the proposed algorithm, constrained to a fully parallel structure, outperforms existing schemes.
Abstract:Linear computation coding is concerned with the compression of multidimensional linear functions, i.e. with reducing the computational effort of multiplying an arbitrary vector to an arbitrary, but known, constant matrix. This paper advances over the state-of-the art, that is based on a discrete matching pursuit (DMP) algorithm, by a step-wise optimal search. Offering significant performance gains over DMP, it is however computationally infeasible for large matrices and high accuracy. Therefore, a reduced-state algorithm is introduced that offers performance superior to DMP, while still being computationally feasible even for large matrices. Depending on the matrix size, the performance gain over DMP is on the order of at least 10%.
Abstract:Classical antenna selection schemes require instantaneous channel state information (CSI). This leads to high signaling overhead in the system. This work proposes a novel joint receive antenna selection and precoding scheme for multiuser multiple-input multiple-output uplink transmission that relies only on the long-term statistics of the CSI. The proposed scheme designs the switching network and the uplink precoders, such that the expected throughput of the system in the long term is maximized. Invoking results from the random matrix theory, we derive a closed-form expression for the expected throughput of the system. We then develop a tractable iterative algorithm to tackle the throughput maximization problem, capitalizing on the alternating optimization and majorization-maximization (MM) techniques. Numerical results substantiate the efficiency of the proposed approach and its superior performance as compared with the baseline.
Abstract:This work studies a low-complexity design for reconfigurable intelligent surface (RIS)-aided multiuser multiple-input multiple-output systems. The base station (BS) applies receive antenna selection to connect a subset of its antennas to the available radio frequency chains. For this setting, the BS switching network, uplink precoders, and RIS phase-shifts are jointly designed, such that the uplink sum-rate is maximized. The principle design problem reduces to an NP-hard mixed-integer optimization. We hence invoke the weighted minimum mean squared error technique and the penalty dual decomposition method to develop a tractable iterative algorithm that approximates the optimal design effectively. Our numerical investigations verify the efficiency of the proposed algorithm and its superior performance as compared with the benchmark.
Abstract:Since the introduction of massive MIMO (mMIMO), the design of a transceiver with feasible complexity has been a challenging problem. Initially, it was believed that the main issue in this respect is the overall RF-cost. However, as mMIMO is becoming more and more a key technology for future wireless networks, it is realized, that the RF-cost is only one of many implementational challenges and design trade-offs. In this paper, we present, analyze and compare various novel mMIMO architectures, considering recent emerging technologies such as intelligent surface-assisted and Rotman lens based architectures. These are compared to the conventional fully digital (FD) and hybrid analog-digital beamforming (HADB) approaches. To enable a fair comparison, we account for various hardware imperfections and losses and utilize a novel, universal algorithm for signal precoding. Based on our thorough investigations, we draw a generic efficiency to quality trade-off for various mMIMO architectures. We find that in a typical cellular communication setting the reflect/transmit array based architectures sketch the best overall trade-off. Further, we show that in a qualitative ranking the power efficiency of the considered architectures is independent of the frequency range.
Abstract:In colocated compressive sensing MIMO radar, the measurement matrix is specified by antenna placement. To guarantee an acceptable recovery performance, this measurement matrix should satisfy certain properties, e.g., a small coherence. Prior work in the literature often employs randomized placement algorithms which optimize the prior distribution of antenna locations. The performance of these algorithms is suboptimal, as they can be easily enhanced via expurgation. In this paper, we suggest an iterative antenna placement algorithm which determines the antenna locations deterministically. The proposed algorithm locates jointly the antenna elements on the transmit and receive arrays, such that the coherence of the resulting measurement matrix is minimized. Numerical simulations demonstrate that the proposed algorithm outperforms significantly the benchmark, even after expurgation.
Abstract:This paper develops a class of low-complexity device scheduling algorithms for over-the-air federated learning via the method of matching pursuit. The proposed scheme tracks closely the close-to-optimal performance achieved by difference-of-convex programming, and outperforms significantly the well-known benchmark algorithms based on convex relaxation. Compared to the state-of-the-art, the proposed scheme poses a drastically lower computational load on the system: For $K$ devices and $N$ antennas at the parameter server, the benchmark complexity scales with $\left(N^2+K\right)^3 + N^6$ while the complexity of the proposed scheme scales with $K^p N^q$ for some $0 < p,q \leq 2$. The efficiency of the proposed scheme is confirmed via numerical experiments on the CIFAR-10 dataset.
Abstract:Unlike the classical linear model, nonlinear generative models have been addressed sparsely in the literature. This work aims to bring attention to these models and their secrecy potential. To this end, we invoke the replica method to derive the asymptotic normalized cross entropy in an inverse probability problem whose generative model is described by a Gaussian random field with a generic covariance function. Our derivations further demonstrate the asymptotic statistical decoupling of Bayesian inference algorithms and specify the decoupled setting for a given nonlinear model. The replica solution depicts that strictly nonlinear models establish an all-or-nothing phase transition: There exists a critical load at which the optimal Bayesian inference changes from being perfect to an uncorrelated learning. This finding leads to design of a new secure coding scheme which achieves the secrecy capacity of the wiretap channel. The proposed coding has a significantly smaller codebook size compared to the random coding scheme of Wyner. This interesting result implies that strictly nonlinear generative models are perfectly secured without any secure coding. We justify this latter statement through the analysis of an illustrative model for perfectly secure and reliable inference.