Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fei Zheng

Copula-based mixture model identification for subgroup clustering with imaging applications

Feb 12, 2025

Fei Zheng, Nicolas Duchateau

Abstract:Model-based clustering techniques have been widely applied to various application areas, while most studies focus on canonical mixtures with unique component distribution form. However, this strict assumption is often hard to satisfy. In this paper, we consider the more flexible Copula-Based Mixture Models (CBMMs) for clustering, which allow heterogeneous component distributions composed by flexible choices of marginal and copula forms. More specifically, we propose an adaptation of the Generalized Iterative Conditional Estimation (GICE) algorithm to identify the CBMMs in an unsupervised manner, where the marginal and copula forms and their parameters are estimated iteratively. GICE is adapted from its original version developed for switching Markov model identification with the choice of realization time. Our CBMM-GICE clustering method is then tested on synthetic two-cluster data (N=2000 samples) with discussion of the factors impacting its convergence. Finally, it is compared to the Expectation Maximization identified mixture models with unique component form on the entire MNIST database (N=70000), and on real cardiac magnetic resonance data (N=276) to illustrate its value for imaging applications.

Via

Access Paper or Ask Questions

WassFFed: Wasserstein Fair Federated Learning

Nov 11, 2024

Zhongxuan Han, Li Zhang, Chaochao Chen, Xiaolin Zheng, Fei Zheng, Yuyuan Li, Jianwei Yin

Figure 1 for WassFFed: Wasserstein Fair Federated Learning

Figure 2 for WassFFed: Wasserstein Fair Federated Learning

Figure 3 for WassFFed: Wasserstein Fair Federated Learning

Figure 4 for WassFFed: Wasserstein Fair Federated Learning

Abstract:Federated Learning (FL) employs a training approach to address scenarios where users' data cannot be shared across clients. Achieving fairness in FL is imperative since training data in FL is inherently geographically distributed among diverse user groups. Existing research on fairness predominantly assumes access to the entire training data, making direct transfer to FL challenging. However, the limited existing research on fairness in FL does not effectively address two key challenges, i.e., (CH1) Current methods fail to deal with the inconsistency between fair optimization results obtained with surrogate functions and fair classification results. (CH2) Directly aggregating local fair models does not always yield a globally fair model due to non Identical and Independent data Distributions (non-IID) among clients. To address these challenges, we propose a Wasserstein Fair Federated Learning framework, namely WassFFed. To tackle CH1, we ensure that the outputs of local models, rather than the loss calculated with surrogate functions or classification results with a threshold, remain independent of various user groups. To resolve CH2, we employ a Wasserstein barycenter calculation of all local models' outputs for each user group, bringing local model outputs closer to the global output distribution to ensure consistency between the global model and local models. We conduct extensive experiments on three real-world datasets, demonstrating that WassFFed outperforms existing approaches in striking a balance between accuracy and fairness.

* Submitted to TKDE

Via

Access Paper or Ask Questions

Input Reconstruction Attack against Vertical Federated Large Language Models

Nov 07, 2023

Fei Zheng

Abstract:Recently, large language models (LLMs) have drawn extensive attention from academia and the public, due to the advent of the ChatGPT. While LLMs show their astonishing ability in text generation for various tasks, privacy concerns limit their usage in real-life businesses. More specifically, either the user's inputs (the user sends the query to the model-hosting server) or the model (the user downloads the complete model) itself will be revealed during the usage. Vertical federated learning (VFL) is a promising solution to this kind of problem. It protects both the user's input and the knowledge of the model by splitting the model into a bottom part and a top part, which is maintained by the user and the model provider, respectively. However, in this paper, we demonstrate that in LLMs, VFL fails to protect the user input since it is simple and cheap to reconstruct the input from the intermediate embeddings. Experiments show that even with a commercial GPU, the input sentence can be reconstructed in only one second. We also discuss several possible solutions to enhance the privacy of vertical federated LLMs.

Via

Access Paper or Ask Questions

Toward ground-truth optical coherence tomography via three-dimensional unsupervised deep learning processing and data

Nov 07, 2023

Renxiong Wu, Fei Zheng, Meixuan Li, Shaoyan Huang, Xin Ge, Linbo Liu, Yong Liu, Guangming Ni

Abstract:Optical coherence tomography (OCT) can perform non-invasive high-resolution three-dimensional (3D) imaging and has been widely used in biomedical fields, while it is inevitably affected by coherence speckle noise which degrades OCT imaging performance and restricts its applications. Here we present a novel speckle-free OCT imaging strategy, named toward-ground-truth OCT (tGT-OCT), that utilizes unsupervised 3D deep-learning processing and leverages OCT 3D imaging features to achieve speckle-free OCT imaging. Specifically, our proposed tGT-OCT utilizes an unsupervised 3D-convolution deep-learning network trained using random 3D volumetric data to distinguish and separate speckle from real structures in 3D imaging volumetric space; moreover, tGT-OCT effectively further reduces speckle noise and reveals structures that would otherwise be obscured by speckle noise while preserving spatial resolution. Results derived from different samples demonstrated the high-quality speckle-free 3D imaging performance of tGT-OCT and its advancement beyond the previous state-of-the-art.

Via

Access Paper or Ask Questions

Defending Label Inference Attacks in Split Learning under Regression Setting

Aug 18, 2023

Haoze Qiu, Fei Zheng, Chaochao Chen, Xiaolin Zheng

Figure 1 for Defending Label Inference Attacks in Split Learning under Regression Setting

Figure 2 for Defending Label Inference Attacks in Split Learning under Regression Setting

Figure 3 for Defending Label Inference Attacks in Split Learning under Regression Setting

Figure 4 for Defending Label Inference Attacks in Split Learning under Regression Setting

Abstract:As a privacy-preserving method for implementing Vertical Federated Learning, Split Learning has been extensively researched. However, numerous studies have indicated that the privacy-preserving capability of Split Learning is insufficient. In this paper, we primarily focus on label inference attacks in Split Learning under regression setting, which are mainly implemented through the gradient inversion method. To defend against label inference attacks, we propose Random Label Extension (RLE), where labels are extended to obfuscate the label information contained in the gradients, thereby preventing the attacker from utilizing gradients to train an attack model that can infer the original labels. To further minimize the impact on the original task, we propose Model-based adaptive Label Extension (MLE), where original labels are preserved in the extended labels and dominate the training process. The experimental results show that compared to the basic defense methods, our proposed defense methods can significantly reduce the attack model's performance while preserving the original task's performance.

Via

Access Paper or Ask Questions

Federated Learning on Non-iid Data via Local and Global Distillation

Jun 26, 2023

Xiaolin Zheng, Senci Ying, Fei Zheng, Jianwei Yin, Longfei Zheng, Chaochao Chen, Fengqin Dong

Figure 1 for Federated Learning on Non-iid Data via Local and Global Distillation

Figure 2 for Federated Learning on Non-iid Data via Local and Global Distillation

Figure 3 for Federated Learning on Non-iid Data via Local and Global Distillation

Figure 4 for Federated Learning on Non-iid Data via Local and Global Distillation

Abstract:Most existing federated learning algorithms are based on the vanilla FedAvg scheme. However, with the increase of data complexity and the number of model parameters, the amount of communication traffic and the number of iteration rounds for training such algorithms increases significantly, especially in non-independently and homogeneously distributed scenarios, where they do not achieve satisfactory performance. In this work, we propose FedND: federated learning with noise distillation. The main idea is to use knowledge distillation to optimize the model training process. In the client, we propose a self-distillation method to train the local model. In the server, we generate noisy samples for each client and use them to distill other clients. Finally, the global model is obtained by the aggregation of local models. Experimental results show that the algorithm achieves the best performance and is more communication-efficient than state-of-the-art methods.

* Accpeted in IEEE ICWS 2023

Via

Access Paper or Ask Questions

Reducing Communication for Split Learning by Randomized Top-k Sparsification

May 29, 2023

Fei Zheng, Chaochao Chen, Lingjuan Lyu, Binhui Yao

Abstract:Split learning is a simple solution for Vertical Federated Learning (VFL), which has drawn substantial attention in both research and application due to its simplicity and efficiency. However, communication efficiency is still a crucial issue for split learning. In this paper, we investigate multiple communication reduction methods for split learning, including cut layer size reduction, top-k sparsification, quantization, and L1 regularization. Through analysis of the cut layer size reduction and top-k sparsification, we further propose randomized top-k sparsification, to make the model generalize and converge better. This is done by selecting top-k elements with a large probability while also having a small probability to select non-top-k elements. Empirical results show that compared with other communication-reduction methods, our proposed randomized top-k sparsification achieves a better model performance under the same compression level.

* Accepted by IJCAI 2023

Via

Access Paper or Ask Questions

Making Split Learning Resilient to Label Leakage by Potential Energy Loss

Oct 18, 2022

Fei Zheng, Chaochao Chen, Binhui Yao, Xiaolin Zheng

Figure 1 for Making Split Learning Resilient to Label Leakage by Potential Energy Loss

Figure 2 for Making Split Learning Resilient to Label Leakage by Potential Energy Loss

Figure 3 for Making Split Learning Resilient to Label Leakage by Potential Energy Loss

Figure 4 for Making Split Learning Resilient to Label Leakage by Potential Energy Loss

Abstract:As a practical privacy-preserving learning method, split learning has drawn much attention in academia and industry. However, its security is constantly being questioned since the intermediate results are shared during training and inference. In this paper, we focus on the privacy leakage problem caused by the trained split model, i.e., the attacker can use a few labeled samples to fine-tune the bottom model, and gets quite good performance. To prevent such kind of privacy leakage, we propose the potential energy loss to make the output of the bottom model become a more `complicated' distribution, by pushing outputs of the same class towards the decision boundary. Therefore, the adversary suffers a large generalization error when fine-tuning the bottom model with only a few leaked labeled samples. Experiment results show that our method significantly lowers the attacker's fine-tuning accuracy, making the split model more resilient to label leakage.

Via

Access Paper or Ask Questions

Towards Secure and Practical Machine Learning via Secret Sharing and Random Permutation

Aug 18, 2021

Fei Zheng, Chaochao Chen, Xiaolin Zheng

Figure 1 for Towards Secure and Practical Machine Learning via Secret Sharing and Random Permutation

Figure 2 for Towards Secure and Practical Machine Learning via Secret Sharing and Random Permutation

Figure 3 for Towards Secure and Practical Machine Learning via Secret Sharing and Random Permutation

Figure 4 for Towards Secure and Practical Machine Learning via Secret Sharing and Random Permutation

Abstract:With the increasing demands for privacy protection, privacy-preserving machine learning has been drawing much attention in both academia and industry. However, most existing methods have their limitations in practical applications. On the one hand, although most cryptographic methods are provable secure, they bring heavy computation and communication. On the other hand, the security of many relatively efficient private methods (e.g., federated learning and split learning) is being questioned, since they are non-provable secure. Inspired by previous work on privacy-preserving machine learning, we build a privacy-preserving machine learning framework by combining random permutation and arithmetic secret sharing via our compute-after-permutation technique. Since our method reduces the cost for element-wise function computation, it is more efficient than existing cryptographic methods. Moreover, by adopting distance correlation as a metric for privacy leakage, we demonstrate that our method is more secure than previous non-provable secure methods. Overall, our proposal achieves a good balance between security and efficiency. Experimental results show that our method not only is up to 6x faster and reduces up to 85% network traffic compared with state-of-the-art cryptographic methods, but also leaks less privacy during the training process compared with non-provable secure methods.

Via

Access Paper or Ask Questions

Efficient Private Machine Learning by Differentiable Random Transformations

Aug 18, 2020

Fei Zheng

Figure 1 for Efficient Private Machine Learning by Differentiable Random Transformations

Figure 2 for Efficient Private Machine Learning by Differentiable Random Transformations

Figure 3 for Efficient Private Machine Learning by Differentiable Random Transformations

Abstract:With the increasing demands for privacy protection, many privacy-preserving machine learning systems were proposed in recent years. However, most of them cannot be put into production due to their slow training and inference speed caused by the heavy cost of homomorphic encryption and secure multiparty computation(MPC) methods. To circumvent this, I proposed a privacy definition which is suitable for large amount of data in machine learning tasks. Based on that, I showed that random transformations like linear transformation and random permutation can well protect privacy. Merging random transformations and arithmetic sharing together, I designed a framework for private machine learning with high efficiency and low computation cost.

Via

Access Paper or Ask Questions