University of Virginia
Abstract:Differential privacy (DP) is applied when fine-tuning pre-trained large language models (LLMs) to limit leakage of training examples. While most DP research has focused on improving a model's privacy-utility tradeoff, some find that DP can be unfair to or biased against underrepresented groups. In this work, we show the impact of DP on bias in LLMs through empirical analysis. Differentially private training can increase the model bias against protected groups w.r.t AUC-based bias metrics. DP makes it more difficult for the model to differentiate between the positive and negative examples from the protected groups and other groups in the rest of the population. Our results also show that the impact of DP on bias is not only affected by the privacy protection level but also the underlying distribution of the dataset.
Abstract:Current mainstream speaker verification systems are predominantly based on the concept of ``speaker embedding", which transforms variable-length speech signals into fixed-length speaker vectors, followed by verification based on cosine similarity between the embeddings of the enrollment and test utterances. However, this approach suffers from considerable performance degradation in the presence of severe noise and interference speakers. This paper introduces Neural Scoring, a novel framework that re-treats speaker verification as a scoring task using a Transformer-based architecture. The proposed method first extracts an embedding from the enrollment speech and frame-level features from the test speech. A Transformer network then generates a decision score that quantifies the likelihood of the enrolled speaker being present in the test speech. We evaluated Neural Scoring on the VoxCeleb dataset across five test scenarios, comparing it with the state-of-the-art embedding-based approach. While Neural Scoring achieves comparable performance to the state-of-the-art under the benchmark (clean) test condition, it demonstrates a remarkable advantage in the four complex scenarios, achieving an overall 64.53% reduction in equal error rate (EER) compared to the baseline.
Abstract:Large Language Models (LLMs) rely on the contextual information embedded in examples/demonstrations to perform in-context learning (ICL). To mitigate the risk of LLMs potentially leaking private information contained in examples in the prompt, we introduce a novel data-adaptive differentially private algorithm called AdaDPSyn to generate synthetic examples from the private dataset and then use these synthetic examples to perform ICL. The objective of AdaDPSyn is to adaptively adjust the noise level in the data synthesis mechanism according to the inherent statistical properties of the data, thereby preserving high ICL accuracy while maintaining formal differential privacy guarantees. A key innovation in AdaDPSyn is the Precision-Focused Iterative Radius Reduction technique, which dynamically refines the aggregation radius - the scope of data grouping for noise addition - based on patterns observed in data clustering, thereby minimizing the amount of additive noise. We conduct extensive experiments on standard benchmarks and compare AdaDPSyn with DP few-shot generation algorithm (Tang et al., 2023). The experiments demonstrate that AdaDPSyn not only outperforms DP few-shot generation, but also maintains high accuracy levels close to those of non-private baselines, providing an effective solution for ICL with privacy protection.
Abstract:In the field of machine unlearning, certified unlearning has been extensively studied in convex machine learning models due to its high efficiency and strong theoretical guarantees. However, its application to deep neural networks (DNNs), known for their highly nonconvex nature, still poses challenges. To bridge the gap between certified unlearning and DNNs, we propose several simple techniques to extend certified unlearning methods to nonconvex objectives. To reduce the time complexity, we develop an efficient computation method by inverse Hessian approximation without compromising certification guarantees. In addition, we extend our discussion of certification to nonconvergence training and sequential unlearning, considering that real-world users can send unlearning requests at different time points. Extensive experiments on three real-world datasets demonstrate the efficacy of our method and the advantages of certified unlearning in DNNs.
Abstract:Video generation models (VGMs) have demonstrated the capability to synthesize high-quality output. It is important to understand their potential to produce unsafe content, such as violent or terrifying videos. In this work, we provide a comprehensive understanding of unsafe video generation. First, to confirm the possibility that these models could indeed generate unsafe videos, we choose unsafe content generation prompts collected from 4chan and Lexica, and three open-source SOTA VGMs to generate unsafe videos. After filtering out duplicates and poorly generated content, we created an initial set of 2112 unsafe videos from an original pool of 5607 videos. Through clustering and thematic coding analysis of these generated videos, we identify 5 unsafe video categories: Distorted/Weird, Terrifying, Pornographic, Violent/Bloody, and Political. With IRB approval, we then recruit online participants to help label the generated videos. Based on the annotations submitted by 403 participants, we identified 937 unsafe videos from the initial video set. With the labeled information and the corresponding prompts, we created the first dataset of unsafe videos generated by VGMs. We then study possible defense mechanisms to prevent the generation of unsafe videos. Existing defense methods in image generation focus on filtering either input prompt or output results. We propose a new approach called Latent Variable Defense (LVD), which works within the model's internal sampling process. LVD can achieve 0.90 defense accuracy while reducing time and computing resources by 10x when sampling a large number of unsafe prompts.
Abstract:Numerous approaches have been recently proposed for learning fair representations that mitigate unfair outcomes in prediction tasks. A key motivation for these methods is that the representations can be used by third parties with unknown objectives. However, because current fair representations are generally not interpretable, the third party cannot use these fair representations for exploration, or to obtain any additional insights, besides the pre-contracted prediction tasks. Thus, to increase data utility beyond prediction tasks, we argue that the representations need to be fair, yet interpretable. We propose a general framework for learning interpretable fair representations by introducing an interpretable "prior knowledge" during the representation learning process. We implement this idea and conduct experiments with ColorMNIST and Dsprite datasets. The results indicate that in addition to being interpretable, our representations attain slightly higher accuracy and fairer outcomes in a downstream classification task compared to state-of-the-art fair representations.
Abstract:Deploying a well-optimized pre-trained speaker recognition model in a new domain often leads to a significant decline in performance. While fine-tuning is a commonly employed solution, it demands ample adaptation data and suffers from parameter inefficiency, rendering it impractical for real-world applications with limited data available for model adaptation. Drawing inspiration from the success of adapters in self-supervised pre-trained models, this paper introduces a SE/BN adapter to address this challenge. By freezing the core speaker encoder and adjusting the feature maps' weights and activation distributions, we introduce a novel adapter utilizing trainable squeeze-and-excitation (SE) blocks and batch normalization (BN) layers, termed SE/BN adapter. Our experiments, conducted using VoxCeleb for pre-training and 4 genres from CN-Celeb for adaptation, demonstrate that the SE/BN adapter offers significant performance improvement over the baseline and competes with the vanilla fine-tuning approach by tuning just 1% of the parameters.
Abstract:The fusion of raw features from multiple sensors on an autonomous vehicle to create a Bird's Eye View (BEV) representation is crucial for planning and control systems. There is growing interest in using deep learning models for BEV semantic segmentation. Anticipating segmentation errors and improving the explainability of DNNs is essential for autonomous driving, yet it is under-studied. This paper introduces a benchmark for predictive uncertainty quantification in BEV segmentation. The benchmark assesses various approaches across three popular datasets using two representative backbones and focuses on the effectiveness of predicted uncertainty in identifying misclassified and out-of-distribution (OOD) pixels, as well as calibration. Empirical findings highlight the challenges in uncertainty quantification. Our results find that evidential deep learning based approaches show the most promise by efficiently quantifying aleatoric and epistemic uncertainty. We propose the Uncertainty-Focal-Cross-Entropy (UFCE) loss, designed for highly imbalanced data, which consistently improves the segmentation quality and calibration. Additionally, we introduce a vacuity-scaled regularization term that enhances the model's focus on high uncertainty pixels, improving epistemic uncertainty quantification.
Abstract:Reinforcement learning (RL) trains an agent from experiences interacting with the environment. In scenarios where online interactions are impractical, offline RL, which trains the agent using pre-collected datasets, has become popular. While this new paradigm presents remarkable effectiveness across various real-world domains, like healthcare and energy management, there is a growing demand to enable agents to rapidly and completely eliminate the influence of specific trajectories from both the training dataset and the trained agents. To meet this problem, this paper advocates Trajdeleter, the first practical approach to trajectory unlearning for offline RL agents. The key idea of Trajdeleter is to guide the agent to demonstrate deteriorating performance when it encounters states associated with unlearning trajectories. Simultaneously, it ensures the agent maintains its original performance level when facing other remaining trajectories. Additionally, we introduce Trajauditor, a simple yet efficient method to evaluate whether Trajdeleter successfully eliminates the specific trajectories of influence from the offline RL agent. Extensive experiments conducted on six offline RL algorithms and three tasks demonstrate that Trajdeleter requires only about 1.5% of the time needed for retraining from scratch. It effectively unlearns an average of 94.8% of the targeted trajectories yet still performs well in actual environment interactions after unlearning. The replication package and agent parameters are available online.
Abstract:We study gradient flow on the exponential loss for a classification problem with a one-layer softmax attention model, where the key and query weight matrices are trained separately. Under a separability assumption on the data, we show that when gradient flow achieves the minimal loss value, it further implicitly minimizes the nuclear norm of the product of the key and query weight matrices. Such implicit regularization can be described by a Support Vector Machine (SVM) problem with respect to the attention weights. This finding contrasts with prior results showing that the gradient descent induces an implicit regularization on the Frobenius norm on the product weight matrix when the key and query matrices are combined into a single weight matrix for training. For diagonal key and query matrices, our analysis builds upon the reparameterization technique and exploits approximate KKT conditions of the SVM associated with the classification data. Moreover, the results are extended to general weights configurations given proper alignment of the weight matrices' singular spaces with the data features at initialization.