Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christian Lau

Navigating Data Heterogeneity in Federated Learning A Semi-Supervised Approach for Object Detection

Oct 27, 2023

Taehyeon Kim, Eric Lin, Junu Lee, Christian Lau, Vaikkunth Mugunthan

Figure 1 for Navigating Data Heterogeneity in Federated Learning A Semi-Supervised Approach for Object Detection

Figure 2 for Navigating Data Heterogeneity in Federated Learning A Semi-Supervised Approach for Object Detection

Figure 3 for Navigating Data Heterogeneity in Federated Learning A Semi-Supervised Approach for Object Detection

Figure 4 for Navigating Data Heterogeneity in Federated Learning A Semi-Supervised Approach for Object Detection

Abstract:Federated Learning (FL) has emerged as a potent framework for training models across distributed data sources while maintaining data privacy. Nevertheless, it faces challenges with limited high-quality labels and non-IID client data, particularly in applications like autonomous driving. To address these hurdles, we navigate the uncharted waters of Semi-Supervised Federated Object Detection (SSFOD). We present a pioneering SSFOD framework, designed for scenarios where labeled data reside only at the server while clients possess unlabeled data. Notably, our method represents the inaugural implementation of SSFOD for clients with 0% labeled non-IID data, a stark contrast to previous studies that maintain some subset of labels at each client. We propose FedSTO, a two-stage strategy encompassing Selective Training followed by Orthogonally enhanced full-parameter training, to effectively address data shift (e.g. weather conditions) between server and clients. Our contributions include selectively refining the backbone of the detector to avert overfitting, orthogonality regularization to boost representation divergence, and local EMA-driven pseudo label assignment to yield high-quality pseudo labels. Extensive validation on prominent autonomous driving datasets (BDD100K, Cityscapes, and SODA10M) attests to the efficacy of our approach, demonstrating state-of-the-art results. Remarkably, FedSTO, using just 20-30% of labels, performs nearly as well as fully-supervised centralized training methods.

* NeurIPS 2023

Via

Access Paper or Ask Questions

Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?

Jul 31, 2023

Albert Yu Sun, Eliott Zemour, Arushi Saxena, Udith Vaidyanathan, Eric Lin, Christian Lau, Vaikkunth Mugunthan

Figure 1 for Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?

Figure 2 for Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?

Figure 3 for Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?

Figure 4 for Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?

Abstract:Machine learning practitioners often fine-tune generative pre-trained models like GPT-3 to improve model performance at specific tasks. Previous works, however, suggest that fine-tuned machine learning models memorize and emit sensitive information from the original fine-tuning dataset. Companies such as OpenAI offer fine-tuning services for their models, but no prior work has conducted a memorization attack on any closed-source models. In this work, we simulate a privacy attack on GPT-3 using OpenAI's fine-tuning API. Our objective is to determine if personally identifiable information (PII) can be extracted from this model. We (1) explore the use of naive prompting methods on a GPT-3 fine-tuned classification model, and (2) we design a practical word generation task called Autocomplete to investigate the extent of PII memorization in fine-tuned GPT-3 within a real-world context. Our findings reveal that fine-tuning GPT3 for both tasks led to the model memorizing and disclosing critical personally identifiable information (PII) obtained from the underlying fine-tuning dataset. To encourage further research, we have made our codes and datasets publicly available on GitHub at: https://github.com/albertsun1/gpt3-pii-attacks

Via

Access Paper or Ask Questions