Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hamid Kazemi

SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF

Nov 04, 2024

Atoosa Chegini, Hamid Kazemi, Iman Mirzadeh, Dong Yin, Maxwell Horton, Moin Nabi, Mehrdad Farajtabar, Keivan Alizadeh

Figure 1 for SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF

Figure 2 for SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF

Figure 3 for SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF

Figure 4 for SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF

Abstract:In Large Language Model (LLM) development, Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning models with human values and preferences. RLHF traditionally relies on the Kullback-Leibler (KL) divergence between the current policy and a frozen initial policy as a reference, which is added as a penalty in policy optimization algorithms like Proximal Policy Optimization (PPO). While this constraint prevents models from deviating too far from the initial checkpoint, it limits exploration of the reward landscape, reducing the model's ability to discover higher-quality solutions. As a result, policy optimization is often trapped in a narrow region of the parameter space, leading to suboptimal alignment and performance. This paper presents SALSA (Soup-based Alignment Learning for Stronger Adaptation), a novel approach designed to overcome these limitations by creating a more flexible and better located reference model through weight-space averaging of two independent supervised fine-tuned (SFT) models. This model soup allows for larger deviation in KL divergence and exploring a promising region of the solution space without sacrificing stability. By leveraging this more robust reference model, SALSA fosters better exploration, achieving higher rewards and improving model robustness, out-of-distribution generalization, and performance. We validate the effectiveness of SALSA through extensive experiments on popular open models (Llama2-7B, Mistral-7B, and Gemma-2B) across various benchmarks (MT-Bench, Arena-Hard, UltraFeedback), where it consistently surpasses PPO by fostering deeper exploration and achieving superior alignment in LLMs.

Via

Access Paper or Ask Questions

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Jun 14, 2024

Abhimanyu Hans, Yuxin Wen, Neel Jain, John Kirchenbauer, Hamid Kazemi, Prajwal Singhania, Siddharth Singh, Gowthami Somepalli, Jonas Geiping, Abhinav Bhatele(+1 more)

Figure 1 for Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Figure 2 for Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Figure 3 for Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Figure 4 for Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Abstract:Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, a randomly sampled subset of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set. We run extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrate significant reductions in extractable memorization with little to no impact on downstream benchmarks.

* 9.5 pages, 8 figures, and 1 table in the main body. Code available at https://github.com/ahans30/goldfish-loss

Via

Access Paper or Ask Questions

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Mar 25, 2024

Hossein Souri, Arpit Bansal, Hamid Kazemi, Liam Fowl, Aniruddha Saha, Jonas Geiping, Andrew Gordon Wilson, Rama Chellappa, Tom Goldstein, Micah Goldblum

Figure 1 for Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Figure 2 for Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Figure 3 for Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Figure 4 for Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Abstract:Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clean data, called base samples, and then modify those samples to craft poisons. However, some base samples may be significantly more amenable to poisoning than others. As a result, we may be able to craft more potent poisons by carefully choosing the base samples. In this work, we use guided diffusion to synthesize base samples from scratch that lead to significantly more potent poisons and backdoors than previous state-of-the-art attacks. Our Guided Diffusion Poisoning (GDP) base samples can be combined with any downstream poisoning or backdoor attack to boost its effectiveness. Our implementation code is publicly available at: https://github.com/hsouri/GDP .

Via

Access Paper or Ask Questions

What do we learn from inverting CLIP models?

Mar 05, 2024

Hamid Kazemi, Atoosa Chegini, Jonas Geiping, Soheil Feizi, Tom Goldstein

Figure 1 for What do we learn from inverting CLIP models?

Figure 2 for What do we learn from inverting CLIP models?

Figure 3 for What do we learn from inverting CLIP models?

Figure 4 for What do we learn from inverting CLIP models?

Abstract:We employ an inversion-based approach to examine CLIP models. Our examination reveals that inverting CLIP models results in the generation of images that exhibit semantic alignment with the specified target prompts. We leverage these inverted images to gain insights into various aspects of CLIP models, such as their ability to blend concepts and inclusion of gender biases. We notably observe instances of NSFW (Not Safe For Work) images during model inversion. This phenomenon occurs even for semantically innocuous prompts, like "a beautiful landscape," as well as for prompts involving the names of celebrities.

* Warning: This paper contains sexually explicit images and language, offensive visuals and terminology, discussions on pornography, gender bias, and other potentially unsettling, distressing, and/or offensive content for certain readers

Via

Access Paper or Ask Questions

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Jan 22, 2024

Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

Figure 1 for Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Figure 2 for Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Figure 3 for Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Figure 4 for Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Abstract:Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.

* 20 pages, code available at https://github.com/ahans30/Binoculars

Via

Access Paper or Ask Questions

What do Vision Transformers Learn? A Visual Exploration

Dec 13, 2022

Amin Ghiasi, Hamid Kazemi, Eitan Borgnia, Steven Reich, Manli Shu, Micah Goldblum, Andrew Gordon Wilson, Tom Goldstein

Figure 1 for What do Vision Transformers Learn? A Visual Exploration

Figure 2 for What do Vision Transformers Learn? A Visual Exploration

Figure 3 for What do Vision Transformers Learn? A Visual Exploration

Figure 4 for What do Vision Transformers Learn? A Visual Exploration

Abstract:Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assisted by these solutions, we observe that neurons in ViTs trained with language model supervision (e.g., CLIP) are activated by semantic concepts rather than visual features. We also explore the underlying differences between ViTs and CNNs, and we find that transformers detect image background features, just like their convolutional counterparts, but their predictions depend far less on high-frequency information. On the other hand, both architecture types behave similarly in the way features progress from abstract patterns in early layers to concrete objects in late layers. In addition, we show that ViTs maintain spatial information in all layers except the final layer. In contrast to previous works, we show that the last layer most likely discards the spatial information and behaves as a learned global pooling operation. Finally, we conduct large-scale visualizations on a wide range of ViT variants, including DeiT, CoaT, ConViT, PiT, Swin, and Twin, to validate the effectiveness of our method.

Via

Access Paper or Ask Questions

Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Oct 19, 2022

Yuxin Wen, Arpit Bansal, Hamid Kazemi, Eitan Borgnia, Micah Goldblum, Jonas Geiping, Tom Goldstein

Figure 1 for Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Figure 2 for Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Figure 3 for Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Figure 4 for Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Abstract:As industrial applications are increasingly automated by machine learning models, enforcing personal data ownership and intellectual property rights requires tracing training data back to their rightful owners. Membership inference algorithms approach this problem by using statistical techniques to discern whether a target sample was included in a model's training set. However, existing methods only utilize the unaltered target sample or simple augmentations of the target to compute statistics. Such a sparse sampling of the model's behavior carries little information, leading to poor inference capabilities. In this work, we use adversarial tools to directly optimize for queries that are discriminative and diverse. Our improvements achieve significantly more accurate membership inference than existing methods, especially in offline scenarios and in the low false-positive regime which is critical in legal settings. Code is available at https://github.com/YuxinWenRick/canary-in-a-coalmine.

* Code is available at https://github.com/YuxinWenRick/canary-in-a-coalmine

Via

Access Paper or Ask Questions

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Aug 19, 2022

Arpit Bansal, Eitan Borgnia, Hong-Min Chu, Jie S. Li, Hamid Kazemi, Furong Huang, Micah Goldblum, Jonas Geiping, Tom Goldstein

Figure 1 for Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Figure 2 for Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Figure 3 for Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Figure 4 for Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Abstract:Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice. Even when using completely deterministic degradations (e.g., blur, masking, and more), the training and test-time update rules that underlie diffusion models can be easily generalized to create generative models. The success of these fully deterministic models calls into question the community's understanding of diffusion models, which relies on noise in either gradient Langevin dynamics or variational inference, and paves the way for generalized diffusion models that invert arbitrary processes. Our code is available at https://github.com/arpitbansal297/Cold-Diffusion-Models

Via

Access Paper or Ask Questions

Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

Jan 31, 2022

Amin Ghiasi, Hamid Kazemi, Steven Reich, Chen Zhu, Micah Goldblum, Tom Goldstein

Figure 1 for Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

Figure 2 for Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

Figure 3 for Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

Figure 4 for Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

Abstract:Existing techniques for model inversion typically rely on hard-to-tune regularizers, such as total variation or feature regularization, which must be individually calibrated for each network in order to produce adequate images. In this work, we introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper-parameter tuning. Under our proposed augmentation-based scheme, the same set of augmentation hyper-parameters can be used for inverting a wide range of image classification models, regardless of input dimensions or the architecture. We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset, tasks which to the best of our knowledge have not been successfully accomplished by any previous works.

Via

Access Paper or Ask Questions