Abstract:Spatiotemporal forecasting has emerged as an indispensable building block of diverse smart city applications, such as intelligent transportation and smart energy management. Recent advancements have uncovered that the performance of spatiotemporal forecasting can be significantly improved by integrating knowledge in geo-distributed time series data from different domains, \eg enhancing real-estate appraisal with human mobility data; joint taxi and bike demand predictions. While effective, existing approaches assume a centralized data collection and exploitation environment, overlooking the privacy and commercial interest concerns associated with data owned by different parties. In this paper, we investigate multi-party collaborative spatiotemporal forecasting without direct access to multi-source private data. However, this task is challenging due to 1) cross-domain feature heterogeneity and 2) cross-client geographical heterogeneity, where standard horizontal or vertical federated learning is inapplicable. To this end, we propose a Heterogeneous SpatioTemporal Federated Learning (HSTFL) framework to enable multiple clients to collaboratively harness geo-distributed time series data from different domains while preserving privacy. Specifically, we first devise vertical federated spatiotemporal representation learning to locally preserve spatiotemporal dependencies among individual participants and generate effective representations for heterogeneous data. Then we propose a cross-client virtual node alignment block to incorporate cross-client spatiotemporal dependencies via a multi-level knowledge fusion scheme. Extensive privacy analysis and experimental evaluations demonstrate that HSTFL not only effectively resists inference attacks but also provides a significant improvement against various baselines.
Abstract:As an essential tool of secure distributed machine learning, vertical federated learning (VFL) based on homomorphic encryption (HE) suffers from severe efficiency problems due to data inflation and time-consuming operations. To this core, we propose PackVFL, an efficient VFL framework based on packed HE (PackedHE), to accelerate the existing HE-based VFL algorithms. PackVFL packs multiple cleartexts into one ciphertext and supports single-instruction-multiple-data (SIMD)-style parallelism. We focus on designing a high-performant matrix multiplication (MatMult) method since it takes up most of the ciphertext computation time in HE-based VFL. Besides, devising the MatMult method is also challenging for PackedHE because a slight difference in the packing way could predominantly affect its computation and communication costs. Without domain-specific design, directly applying SOTA MatMult methods is hard to achieve optimal. Therefore, we make a three-fold design: 1) we systematically explore the current design space of MatMult and quantify the complexity of existing approaches to provide guidance; 2) we propose a hybrid MatMult method according to the unique characteristics of VFL; 3) we adaptively apply our hybrid method in representative VFL algorithms, leveraging distinctive algorithmic properties to further improve efficiency. As the batch size, feature dimension and model size of VFL scale up to large sizes, PackVFL consistently delivers enhanced performance. Empirically, PackVFL propels existing VFL algorithms to new heights, achieving up to a 51.52X end-to-end speedup. This represents a substantial 34.51X greater speedup compared to the direct application of SOTA MatMult methods.
Abstract:Vertical federated learning (VFL) is attracting much attention because it enables cross-silo data cooperation in a privacy-preserving manner. While most research works in VFL focus on linear and tree models, deep models (e.g., neural networks) are not well studied in VFL. In this paper, we focus on SplitNN, a well-known neural network framework in VFL, and identify a trade-off between data security and model performance in SplitNN. Briefly, SplitNN trains the model by exchanging gradients and transformed data. On the one hand, SplitNN suffers from the loss of model performance since multiply parties jointly train the model using transformed data instead of raw data, and a large amount of low-level feature information is discarded. On the other hand, a naive solution of increasing the model performance through aggregating at lower layers in SplitNN (i.e., the data is less transformed and more low-level feature is preserved) makes raw data vulnerable to inference attacks. To mitigate the above trade-off, we propose a new neural network protocol in VFL called Security Forward Aggregation (SFA). It changes the way of aggregating the transformed data and adopts removable masks to protect the raw data. Experiment results show that networks with SFA achieve both data security and high model performance.