Abstract:Explainable AI is a crucial component for edge services, as it ensures reliable decision making based on complex AI models. Surrogate models are a prominent approach of XAI where human-interpretable models, such as a linear regression model, are trained to approximate a complex (black-box) model's predictions. This paper delves into the balance between the predictive accuracy of complex AI models and their approximation by surrogate ones, advocating that both these models benefit from being learned simultaneously. We derive a joint (bi-level) training scheme for both models and we introduce a new algorithm based on multi-objective optimization (MOO) to simultaneously minimize both the complex model's prediction error and the error between its outputs and those of the surrogate. Our approach leads to improvements that exceed 99% in the approximation of the black-box model through the surrogate one, as measured by the metric of Fidelity, for a compromise of less than 3% absolute reduction in the black-box model's predictive accuracy, compared to single-task and multi-task learning baselines. By improving Fidelity, we can derive more trustworthy explanations of the complex model's outcomes from the surrogate, enabling reliable AI applications for intelligent services at the network edge.
Abstract:In Federated Learning (FL), datasets across clients tend to be heterogeneous or personalized, and this poses challenges to the convergence of standard FL schemes that do not account for personalization. To address this, we present a new approach for personalized FL that achieves exact stochastic gradient descent (SGD) minimization. We start from the FedPer (Arivazhagan et al., 2019) neural network (NN) architecture for personalization, whereby the NN has two types of layers: the first ones are the common layers across clients, while the few final ones are client-specific and are needed for personalization. We propose a novel SGD-type scheme where, at each optimization round, randomly selected clients perform gradient-descent updates over their client-specific weights towards optimizing the loss function on their own datasets, without updating the common weights. At the final update, each client computes the joint gradient over both client-specific and common weights and returns the gradient of common parameters to the server. This allows to perform an exact and unbiased SGD step over the full set of parameters in a distributed manner, i.e. the updates of the personalized parameters are performed by the clients and those of the common ones by the server. Our method is superior to FedAvg and FedPer baselines in multi-class classification benchmarks such as Omniglot, CIFAR-10, MNIST, Fashion-MNIST, and EMNIST and has much lower computational complexity per round.