Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Amer Sinha

Scaling Laws for Differentially Private Language Models

Jan 31, 2025

Ryan McKenna, Yangsibo Huang, Amer Sinha, Borja Balle, Zachary Charles, Christopher A. Choquette-Choo, Badih Ghazi, George Kaissis, Ravi Kumar, Ruibo Liu(+2 more)

Abstract:Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs also rely on large, high-quality training datasets, like those sourced from (sometimes sensitive) user data. Training models on this sensitive user data requires careful privacy protections like differential privacy (DP). However, the dynamics of DP training are significantly different, and consequently their scaling laws are not yet fully understood. In this work, we establish scaling laws that accurately model the intricacies of DP LLM training, providing a complete picture of the compute-privacy-utility tradeoffs and the optimal training configurations in many settings.

Via

Access Paper or Ask Questions

Balls-and-Bins Sampling for DP-SGD

Dec 21, 2024

Lynn Chua, Badih Ghazi, Charlie Harrison, Ethan Leeman, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Abstract:We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typically assumed that Poisson subsampling is used instead. Recent work by Chua et al. (ICML 2024) however pointed out that shuffling based DP-SGD can have a much larger privacy cost in practical regimes of parameters. We show that the Balls-and-Bins sampling achieves the "best-of-both" samplers, namely, the implementation of Balls-and-Bins sampling is similar to that of Shuffling and models trained using DP-SGD with Balls-and-Bins sampling achieve utility comparable to those trained using DP-SGD with Shuffling at the same noise multiplier, and yet, Balls-and-Bins sampling enjoys similar-or-better privacy amplification as compared to Poisson subsampling in practical regimes.

Via

Access Paper or Ask Questions

Scalable DP-SGD: Shuffling vs. Poisson Subsampling

Nov 06, 2024

Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Figure 1 for Scalable DP-SGD: Shuffling vs. Poisson Subsampling

Figure 2 for Scalable DP-SGD: Shuffling vs. Poisson Subsampling

Figure 3 for Scalable DP-SGD: Shuffling vs. Poisson Subsampling

Figure 4 for Scalable DP-SGD: Shuffling vs. Poisson Subsampling

Abstract:We provide new lower bounds on the privacy guarantee of the multi-epoch Adaptive Batch Linear Queries (ABLQ) mechanism with shuffled batch sampling, demonstrating substantial gaps when compared to Poisson subsampling; prior analysis was limited to a single epoch. Since the privacy analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) is obtained by analyzing the ABLQ mechanism, this brings into serious question the common practice of implementing shuffling-based DP-SGD, but reporting privacy parameters as if Poisson subsampling was used. To understand the impact of this gap on the utility of trained machine learning models, we introduce a practical approach to implement Poisson subsampling at scale using massively parallel computation, and efficiently train models with the same. We compare the utility of models trained with Poisson-subsampling-based DP-SGD, and the optimistic estimates of utility when using shuffling, via our new lower bounds on the privacy guarantee of ABLQ with shuffling.

* To appear at NeurIPS 2024

Via

Access Paper or Ask Questions

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Jun 23, 2024

Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chulin Xie, Chiyuan Zhang

Figure 1 for Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Figure 2 for Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Figure 3 for Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Figure 4 for Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Abstract:Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks. We observe that while these models show promising surface-level crosslingual abilities on machine translation and embedding space analyses, they struggle with deeper crosslingual knowledge transfer, revealing a crosslingual knowledge barrier in both general (MMLU benchmark) and domain-specific (Harry Potter quiz) contexts. We observe that simple inference-time mitigation methods offer only limited improvement. On the other hand, we propose fine-tuning of LLMs on mixed-language data, which effectively reduces these gaps, even when using out-of-domain datasets like WikiText. Our findings suggest the need for explicit optimization to unlock the full crosslingual potential of LLMs. Our code is publicly available at https://github.com/google-research/crosslingual-knowledge-barriers.

Via

Access Paper or Ask Questions

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Jun 20, 2024

Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Figure 1 for Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Figure 2 for Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Figure 3 for Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Figure 4 for Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Abstract:Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are `almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design choices like data selection strategies and parameter tuning for the best privacy-utility tradeoff.

Via

Access Paper or Ask Questions

How Private is DP-SGD?

Mar 26, 2024

Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Abstract:We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling based DP-SGD is more commonly used in practical implementations, it is neither analytically nor numerically amenable to easy privacy analysis. On the other hand, Poisson subsampling based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD.

Via

Access Paper or Ask Questions

Training Differentially Private Ad Prediction Models with Semi-Sensitive Features

Jan 26, 2024

Lynn Chua, Qiliang Cui, Badih Ghazi, Charlie Harrison, Pritish Kamath, Walid Krichene, Ravi Kumar, Pasin Manurangsi, Krishna Giri Narra, Amer Sinha(+2 more)

Figure 1 for Training Differentially Private Ad Prediction Models with Semi-Sensitive Features

Figure 2 for Training Differentially Private Ad Prediction Models with Semi-Sensitive Features

Figure 3 for Training Differentially Private Ad Prediction Models with Semi-Sensitive Features

Figure 4 for Training Differentially Private Ad Prediction Models with Semi-Sensitive Features

Abstract:Motivated by problems arising in digital advertising, we introduce the task of training differentially private (DP) machine learning models with semi-sensitive features. In this setting, a subset of the features is known to the attacker (and thus need not be protected) while the remaining features as well as the label are unknown to the attacker and should be protected by the DP guarantee. This task interpolates between training the model with full DP (where the label and all features should be protected) or with label DP (where all the features are considered known, and only the label should be protected). We present a new algorithm for training DP models with semi-sensitive features. Through an empirical evaluation on real ads datasets, we demonstrate that our algorithm surpasses in utility the baselines of (i) DP stochastic gradient descent (DP-SGD) run on all features (known and unknown), and (ii) a label DP algorithm run only on the known features (while discarding the unknown ones).

* 7 pages, 4 figures

Via

Access Paper or Ask Questions

Sparsity-Preserving Differentially Private Training of Large Embedding Models

Nov 14, 2023

Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Figure 1 for Sparsity-Preserving Differentially Private Training of Large Embedding Models

Figure 2 for Sparsity-Preserving Differentially Private Training of Large Embedding Models

Figure 3 for Sparsity-Preserving Differentially Private Training of Large Embedding Models

Figure 4 for Sparsity-Preserving Differentially Private Training of Large Embedding Models

Abstract:As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient descent, has been the workhorse in protecting user privacy without compromising model accuracy by much. However, applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. To address this issue, we present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models. Our algorithms achieve substantial reductions ($10^6 \times$) in gradient size, while maintaining comparable levels of accuracy, on benchmark real-world datasets.

* Neural Information Processing Systems (NeurIPS) 2023

Via

Access Paper or Ask Questions

Private Ad Modeling with DP-SGD

Nov 21, 2022

Carson Denison, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Krishna Giri Narra, Amer Sinha, Avinash Varadarajan, Chiyuan Zhang

Figure 1 for Private Ad Modeling with DP-SGD

Figure 2 for Private Ad Modeling with DP-SGD

Figure 3 for Private Ad Modeling with DP-SGD

Figure 4 for Private Ad Modeling with DP-SGD

Abstract:A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient descent (DP-SGD). While this algorithm has been evaluated on text and image data, it has not been previously applied to ads data, which are notorious for their high class imbalance and sparse gradient updates. In this work we apply DP-SGD to several ad modeling tasks including predicting click-through rates, conversion rates, and number of conversion events, and evaluate their privacy-utility trade-off on real-world datasets. Our work is the first to empirically demonstrate that DP-SGD can provide both privacy and utility for ad modeling tasks.

Via

Access Paper or Ask Questions