Abstract:Communication-constrained algorithms for decentralized learning and optimization rely on local updates coupled with the exchange of compressed signals. In this context, differential quantization is an effective technique to mitigate the negative impact of compression by leveraging correlations between successive iterates. In addition, the use of error feedback, which consists of incorporating the compression error into subsequent steps, is a powerful mechanism to compensate for the bias caused by the compression. Under error feedback, performance guarantees in the literature have so far focused on algorithms employing a fusion center or a special class of contractive compressors that cannot be implemented with a finite number of bits. In this work, we propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback. The approach is specifically tailored for decentralized learning problems where agents have individual risk functions to minimize subject to subspace constraints that require the minimizers across the network to lie in low-dimensional subspaces. This constrained formulation includes consensus or single-task optimization as special cases, and allows for more general task relatedness models such as multitask smoothness and coupled optimization. We show that, under some general conditions on the compression noise, and for sufficiently small step-sizes $\mu$, the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate: by reducing $\mu$, it is possible to keep the estimation errors small (on the order of $\mu$) without increasing indefinitely the bit rate as $\mu\rightarrow 0$. The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
Abstract:Most algorithms for decentralized learning employ a consensus or diffusion mechanism to drive agents to a common solution of a global optimization problem. Generally this takes the form of linear averaging, at a rate of contraction determined by the mixing rate of the underlying network topology. For very sparse graphs this can yield a bottleneck, slowing down the convergence of the learning algorithm. We show that a sequence of matrices achieving finite-time consensus can be learned for unknown graph topologies in a decentralized manner by solving a constrained matrix factorization problem. We demonstrate numerically the benefit of the resulting scheme in both structured and unstructured graphs.
Abstract:Distributed learning paradigms, such as federated or decentralized learning, allow a collection of agents to solve global learning and optimization problems through limited local interactions. Most such strategies rely on a mixture of local adaptation and aggregation steps, either among peers or at a central fusion center. Classically, aggregation in distributed learning is based on averaging, which is statistically efficient, but susceptible to attacks by even a small number of malicious agents. This observation has motivated a number of recent works, which develop robust aggregation schemes by employing robust variations of the mean. We present a new attack based on sensitivity curve maximization (SCM), and demonstrate that it is able to disrupt existing robust aggregation schemes by injecting small, but effective perturbations.
Abstract:Classical paradigms for distributed learning, such as federated or decentralized gradient descent, employ consensus mechanisms to enforce homogeneity among agents. While these strategies have proven effective in i.i.d. scenarios, they can result in significant performance degradation when agents follow heterogeneous objectives or data. Distributed strategies for multitask learning, on the other hand, induce relationships between agents in a more nuanced manner, and encourage collaboration without enforcing consensus. We develop a generalization of the exact diffusion algorithm for subspace constrained multitask learning over networks, and derive an accurate expression for its mean-squared deviation when utilizing noisy gradient approximations. We verify numerically the accuracy of the predicted performance expressions, as well as the improved performance of the proposed approach over alternatives based on approximate projections.
Abstract:The vulnerability of machine learning models to adversarial attacks has been attracting considerable attention in recent years. Most existing studies focus on the behavior of stand-alone single-agent learners. In comparison, this work studies adversarial training over graphs, where individual agents are subjected to perturbations of varied strength levels across space. It is expected that interactions by linked agents, and the heterogeneity of the attack models that are possible over the graph, can help enhance robustness in view of the coordination power of the group. Using a min-max formulation of diffusion learning, we develop a decentralized adversarial training framework for multi-agent systems. We analyze the convergence properties of the proposed scheme for both convex and non-convex environments, and illustrate the enhanced robustness to adversarial attacks.
Abstract:This work focuses on adversarial learning over graphs. We propose a general adversarial training framework for multi-agent systems using diffusion learning. We analyze the convergence properties of the proposed scheme for convex optimization problems, and illustrate its enhanced robustness to adversarial attacks.
Abstract:We study the privatization of distributed learning and optimization strategies. We focus on differential privacy schemes and study their effect on performance. We show that the popular additive random perturbation scheme degrades performance because it is not well-tuned to the graph structure. For this reason, we exploit two alternative graph-homomorphic constructions and show that they improve performance while guaranteeing privacy. Moreover, contrary to most earlier studies, the gradient of the risks is not assumed to be bounded (a condition that rarely holds in practice; e.g., quadratic risk). We avoid this condition and still devise a differentially private scheme with high probability. We examine optimization and learning scenarios and illustrate the theoretical findings through simulations.
Abstract:This work studies networked agents cooperating to track a dynamical state of nature under partial information. The proposed algorithm is a distributed Bayesian filtering algorithm for finite-state hidden Markov models (HMMs). It can be used for sequential state estimation tasks, as well as for modeling opinion formation over social networks under dynamic environments. We show that the disagreement with the optimal centralized solution is asymptotically bounded for the class of geometrically ergodic state transition models, which includes rapidly changing models. We also derive recursions for calculating the probability of error and establish convergence under Gaussian observation models. Simulations are provided to illustrate the theory and to compare against alternative approaches.
Abstract:We study the generation of dependent random numbers in a distributed fashion in order to enable privatized distributed learning by networked agents. We propose a method that we refer to as local graph-homomorphic processing; it relies on the construction of particular noises over the edges to ensure a certain level of differential privacy. We show that the added noise does not affect the performance of the learned model. This is a significant improvement to previous works on differential privacy for distributed algorithms, where the noise was added in a less structured manner without respecting the graph topology and has often led to performance deterioration. We illustrate the theoretical results by considering a linear regression problem over a network of agents.
Abstract:The article reviews significant advances in networked signal and information processing, which have enabled in the last 25 years extending decision making and inference, optimization, control, and learning to the increasingly ubiquitous environments of distributed agents. As these interacting agents cooperate, new collective behaviors emerge from local decisions and actions. Moreover, and significantly, theory and applications show that networked agents, through cooperation and sharing, are able to match the performance of cloud or federated solutions, while preserving privacy, increasing resilience, and saving resources.