Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lillian Sun

DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning

Nov 11, 2025

Jay Chooi, Kevin Cong, Russell Li, Lillian Sun

Figure 1 for DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning

Figure 2 for DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning

Figure 3 for DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning

Figure 4 for DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning

Abstract:As deep learning methods increasingly utilize sensitive data on a widespread scale, differential privacy (DP) offers formal guarantees to protect against information leakage during model training. A significant challenge remains in implementing DP optimizers that retain strong performance while preserving privacy. Recent advances introduced ever more efficient optimizers, with AdamW being a popular choice for training deep learning models because of strong empirical performance. We study \emph{DP-AdamW} and introduce \emph{DP-AdamW-BC}, a differentially private variant of the AdamW optimizer with DP bias correction for the second moment estimator. We start by showing theoretical results for privacy and convergence guarantees of DP-AdamW and DP-AdamW-BC. Then, we empirically analyze the behavior of both optimizers across multiple privacy budgets ($ε= 1, 3, 7$). We find that DP-AdamW outperforms existing state-of-the-art differentially private optimizers like DP-SGD, DP-Adam, and DP-AdamBC, scoring over 15\% higher on text classification, up to 5\% higher on image classification, and consistently 1\% higher on graph node classification. Moreover, we empirically show that incorporating bias correction in DP-AdamW (DP-AdamW-BC) consistently decreases accuracy, in contrast to the improvement of DP-AdamBC improvement over DP-Adam.

* 19 pages, 5 appendices; presented at ICML 2025 DIG-BUGS Workshop

Via

Access Paper or Ask Questions

Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

Dec 31, 2024

Martin Pawelczyk, Lillian Sun, Zhenting Qi, Aounon Kumar, Himabindu Lakkaraju

Figure 1 for Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

Figure 2 for Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

Figure 3 for Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

Figure 4 for Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

Abstract:The rapid proliferation of generative AI, especially large language models, has led to their integration into a variety of applications. A key phenomenon known as weak-to-strong generalization - where a strong model trained on a weak model's outputs surpasses the weak model in task performance - has gained significant attention. Yet, whether critical trustworthiness properties such as robustness, fairness, and privacy can generalize similarly remains an open question. In this work, we study this question by examining if a stronger model can inherit trustworthiness properties when fine-tuned on a weaker model's outputs, a process we term weak-to-strong trustworthiness generalization. To address this, we introduce two foundational training strategies: 1) Weak Trustworthiness Finetuning (Weak TFT), which leverages trustworthiness regularization during the fine-tuning of the weak model, and 2) Weak and Weak-to-Strong Trustworthiness Finetuning (Weak+WTS TFT), which extends regularization to both weak and strong models. Our experimental evaluation on real-world datasets reveals that while some trustworthiness properties, such as fairness, adversarial, and OOD robustness, show significant improvement in transfer when both models were regularized, others like privacy do not exhibit signs of weak-to-strong trustworthiness. As the first study to explore trustworthiness generalization via weak-to-strong generalization, our work provides valuable insights into the potential and limitations of weak-to-strong generalization.

* The first two authors contributed equally

Via

Access Paper or Ask Questions