Abstract:Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to `reduce' the problem of training DP Transformer to the more basic problem of training DP vanilla neural nets. The latter is better understood and amenable to many model-agnostic methods. Such `reduction' is done by first identifying the hardness unique to DP Transformer training: the attention distraction phenomenon and a lack of compatibility with existing techniques for efficient gradient clipping. To deal with these two issues, we propose the Re-Attention Mechanism and Phantom Clipping, respectively. We believe that our work not only casts new light on training DP Transformers but also promotes a modular treatment to advance research in the field of differentially private deep learning.
Abstract:The Transformer has emerged as a versatile and effective architecture with broad applications. However, it still remains an open problem how to efficiently train a Transformer model of high utility with differential privacy guarantees. In this paper, we identify two key challenges in learning differentially private Transformers, i.e., heavy computation overhead due to per-sample gradient clipping and unintentional attention distraction within the attention mechanism. In response, we propose DPFormer, equipped with Phantom Clipping and Re-Attention Mechanism, to address these challenges. Our theoretical analysis shows that DPFormer can reduce computational costs during gradient clipping and effectively mitigate attention distraction (which could obstruct the training process and lead to a significant performance drop, especially in the presence of long-tailed data). Such analysis is further corroborated by empirical results on two real-world datasets, demonstrating the efficiency and effectiveness of the proposed DPFormer.
Abstract:Hyperparameter tuning is a common practice in the application of machine learning but is a typically ignored aspect in the literature on privacy-preserving machine learning due to its negative effect on the overall privacy parameter. In this paper, we aim to tackle this fundamental yet challenging problem by providing an effective hyperparameter tuning framework with differential privacy. The proposed method allows us to adopt a broader hyperparameter search space and even to perform a grid search over the whole space, since its privacy loss parameter is independent of the number of hyperparameter candidates. Interestingly, it instead correlates with the utility gained from hyperparameter searching, revealing an explicit and mandatory trade-off between privacy and utility. Theoretically, we show that its additional privacy loss bound incurred by hyperparameter tuning is upper-bounded by the squared root of the gained utility. However, we note that the additional privacy loss bound would empirically scale like a squared root of the logarithm of the utility term, benefiting from the design of doubling step.
Abstract:Recently, the artificial intelligence of things (AIoT) has been gaining increasing attention, with an intriguing vision of providing highly intelligent services through the network connection of things, leading to an advanced AI-driven ecology. However, recent regulatory restrictions on data privacy preclude uploading sensitive local data to data centers and utilizing them in a centralized approach. Directly applying federated learning algorithms in this scenario could hardly meet the industrial requirements of both efficiency and accuracy. Therefore, we propose an efficient industrial federated learning framework for AIoT in terms of a face recognition application. Specifically, we propose to utilize the concept of transfer learning to speed up federated training on devices and further present a novel design of a private projector that helps protect shared gradients without incurring additional memory consumption or computational cost. Empirical studies on a private Asian face dataset show that our approach can achieve high recognition accuracy in only 20 communication rounds, demonstrating its effectiveness in prediction and its efficiency in training.
Abstract:Federated learning, as a privacy-preserving collaborative machine learning paradigm, has been gaining more and more attention in the industry. With the huge rise in demand, there have been many federated learning platforms that allow federated participants to set up and build a federated model from scratch. However, exiting platforms are highly intrusive, complicated, and hard to integrate with built machine learning models. For many real-world businesses that already have mature serving models, existing federated learning platforms have high entry barriers and development costs. This paper presents a simple yet practical federated learning plug-in inspired by ensemble learning, dubbed WrapperFL, allowing participants to build/join a federated system with existing models at minimal costs. The WrapperFL works in a plug-and-play way by simply attaching to the input and output interfaces of an existing model, without the need of re-development, significantly reducing the overhead of manpower and resources. We verify our proposed method on diverse tasks under heterogeneous data distributions and heterogeneous models. The experimental results demonstrate that WrapperFL can be successfully applied to a wide range of applications under practical settings and improves the local model with federated learning at a low cost.
Abstract:Federated learning is a new machine learning paradigm which allows data parties to build machine learning models collaboratively while keeping their data secure and private. While research efforts on federated learning have been growing tremendously in the past two years, most existing works still depend on pre-existing public datasets and artificial partitions to simulate data federations due to the lack of high-quality labeled data generated from real-world edge applications. Consequently, advances on benchmark and model evaluations for federated learning have been lagging behind. In this paper, we introduce a real-world image dataset. The dataset contains more than 900 images generated from 26 street cameras and 7 object categories annotated with detailed bounding box. The data distribution is non-IID and unbalanced, reflecting the characteristic real-world federated learning scenarios. Based on this dataset, we implemented two mainstream object detection algorithms (YOLO and Faster R-CNN) and provided an extensive benchmark on model performance, efficiency, and communication in a federated learning setting. Both the dataset and algorithms are made publicly available.