Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kevin Clark

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Sep 29, 2023

Kevin Clark, Paul Vicol, Kevin Swersky, David J Fleet

Figure 1 for Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Figure 2 for Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Figure 3 for Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Figure 4 for Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Abstract:We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models. We first show that it is possible to backpropagate the reward function gradient through the full sampling procedure, and that doing so achieves strong performance on a variety of rewards, outperforming reinforcement learning-based approaches. We then propose more efficient variants of DRaFT: DRaFT-K, which truncates backpropagation to only the last K steps of sampling, and DRaFT-LV, which obtains lower-variance gradient estimates for the case when K=1. We show that our methods work well for a variety of reward functions and can be used to substantially improve the aesthetic quality of images generated by Stable Diffusion 1.4. Finally, we draw connections between our approach and prior work, providing a unifying perspective on the design space of gradient-based fine-tuning algorithms.

Via

Access Paper or Ask Questions

Intriguing properties of generative classifiers

Sep 28, 2023

Priyank Jaini, Kevin Clark, Robert Geirhos

Figure 1 for Intriguing properties of generative classifiers

Figure 2 for Intriguing properties of generative classifiers

Figure 3 for Intriguing properties of generative classifiers

Figure 4 for Intriguing properties of generative classifiers

Abstract:What is the best paradigm to recognize objects -- discriminative inference (fast but potentially prone to shortcut learning) or using a generative model (slow but potentially more robust)? We build on recent advances in generative modeling that turn text-to-image models into classifiers. This allows us to study their behavior and to compare them against discriminative models and human psychophysical data. We report four intriguing emergent properties of generative classifiers: they show a record-breaking human-like shape bias (99% for Imagen), near human-level out-of-distribution accuracy, state-of-the-art alignment with human classification errors, and they understand certain perceptual illusions. Our results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data surprisingly well.

Via

Access Paper or Ask Questions

Towards Expert-Level Medical Question Answering with Large Language Models

May 16, 2023

Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Le Hou, Kevin Clark, Stephen Pfohl, Heather Cole-Lewis, Darlene Neal(+21 more)

Figure 1 for Towards Expert-Level Medical Question Answering with Large Language Models

Figure 2 for Towards Expert-Level Medical Question Answering with Large Language Models

Figure 3 for Towards Expert-Level Medical Question Answering with Large Language Models

Figure 4 for Towards Expert-Level Medical Question Answering with Large Language Models

Abstract:Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach. Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets. We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations. While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.

Via

Access Paper or Ask Questions

Text-to-Image Diffusion Models are Zero-Shot Classifiers

Mar 27, 2023

Kevin Clark, Priyank Jaini

Abstract:The excellent generative capabilities of text-to-image diffusion models suggest they learn informative representations of image-text data. However, what knowledge their representations capture is not fully understood, and they have not been thoroughly explored on downstream tasks. We investigate diffusion models by proposing a method for evaluating them as zero-shot classifiers. The key idea is using a diffusion model's ability to denoise a noised image given a text description of a label as a proxy for that label's likelihood. We apply our method to Imagen, using it to probe fine-grained aspects of Imagen's knowledge and comparing it with CLIP's zero-shot abilities. Imagen performs competitively with CLIP on a wide range of zero-shot image classification datasets. Additionally, it achieves state-of-the-art results on shape/texture bias tests and can successfully perform attribute binding while CLIP cannot. Although generative pre-training is prevalent in NLP, visual foundation models often use other methods such as contrastive learning. Based on our findings, we argue that generative pre-training should be explored as a compelling alternative for vision and vision-language problems.

Via

Access Paper or Ask Questions

Meta-Learning Fast Weight Language Models

Dec 05, 2022

Kevin Clark, Kelvin Guu, Ming-Wei Chang, Panupong Pasupat, Geoffrey Hinton, Mohammad Norouzi

Figure 1 for Meta-Learning Fast Weight Language Models

Figure 2 for Meta-Learning Fast Weight Language Models

Figure 3 for Meta-Learning Fast Weight Language Models

Figure 4 for Meta-Learning Fast Weight Language Models

Abstract:Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference. We present Fast Weight Layers (FWLs), a neural component that provides the benefits of dynamic evaluation much more efficiently by expressing gradient updates as linear attention. A key improvement over dynamic evaluation is that FWLs can also be applied at training time so the model learns to make good use of gradient updates. FWLs can easily be added on top of existing transformer models, require relatively little extra compute or memory to run, and significantly improve language modeling perplexity.

* EMNLP 2022 short paper

Via

Access Paper or Ask Questions

nuReality: A VR environment for research of pedestrian and autonomous vehicle interactions

Jan 12, 2022

Paul Schmitt, Nicholas Britten, JiHyun Jeong, Amelia Coffey, Kevin Clark, Shweta Sunil Kothawade, Elena Corina Grigore, Adam Khaw, Christopher Konopka, Linh Pham(+4 more)

Figure 1 for nuReality: A VR environment for research of pedestrian and autonomous vehicle interactions

Figure 2 for nuReality: A VR environment for research of pedestrian and autonomous vehicle interactions

Figure 3 for nuReality: A VR environment for research of pedestrian and autonomous vehicle interactions

Figure 4 for nuReality: A VR environment for research of pedestrian and autonomous vehicle interactions

Abstract:We present nuReality, a virtual reality 'VR' environment designed to test the efficacy of vehicular behaviors to communicate intent during interactions between autonomous vehicles 'AVs' and pedestrians at urban intersections. In this project we focus on expressive behaviors as a means for pedestrians to readily recognize the underlying intent of the AV's movements. VR is an ideal tool to use to test these situations as it can be immersive and place subjects into these potentially dangerous scenarios without risk. nuReality provides a novel and immersive virtual reality environment that includes numerous visual details (road and building texturing, parked cars, swaying tree limbs) as well as auditory details (birds chirping, cars honking in the distance, people talking). In these files we present the nuReality environment, its 10 unique vehicle behavior scenarios, and the Unreal Engine and Autodesk Maya source files for each scenario. The files are publicly released as open source at www.nuReality.org, to support the academic community studying the critical AV-pedestrian interaction.

Via

Access Paper or Ask Questions

Pre-Training Transformers as Energy-Based Cloze Models

Dec 15, 2020

Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

Figure 1 for Pre-Training Transformers as Energy-Based Cloze Models

Figure 2 for Pre-Training Transformers as Energy-Based Cloze Models

Figure 3 for Pre-Training Transformers as Energy-Based Cloze Models

Figure 4 for Pre-Training Transformers as Energy-Based Cloze Models

Abstract:We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.

* EMNLP 2020

Via

Access Paper or Ask Questions

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Mar 23, 2020

Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

Abstract:Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a more sample-efficient pre-training task called replaced token detection. Instead of masking the input, our approach corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments demonstrate this new pre-training task is more efficient than MLM because the task is defined over all input tokens rather than just the small subset that was masked out. As a result, the contextual representations learned by our approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale, where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when using the same amount of compute.

* ICLR 2020

Via

Access Paper or Ask Questions

BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Jul 10, 2019

Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le

Figure 1 for BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Figure 2 for BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Figure 3 for BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Figure 4 for BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Abstract:It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task model surpass its single-task teachers. We evaluate our approach by multi-task fine-tuning BERT on the GLUE benchmark. Our method consistently improves over standard single-task and multi-task training.

* ACL 2019

Via

Access Paper or Ask Questions

What Does BERT Look At? An Analysis of BERT's Attention

Jun 11, 2019

Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning

Figure 1 for What Does BERT Look At? An Analysis of BERT's Attention

Figure 2 for What Does BERT Look At? An Analysis of BERT's Attention

Figure 3 for What Does BERT Look At? An Analysis of BERT's Attention

Figure 4 for What Does BERT Look At? An Analysis of BERT's Attention

Abstract:Large pre-trained neural networks such as BERT have had great recent success in NLP, motivating a growing body of research investigating what aspects of language they are able to learn from unlabeled data. Most recent analysis has focused on model outputs (e.g., language model surprisal) or internal vector representations (e.g., probing classifiers). Complementary to these works, we propose methods for analyzing the attention mechanisms of pre-trained models and apply them to BERT. BERT's attention heads exhibit patterns such as attending to delimiter tokens, specific positional offsets, or broadly attending over the whole sentence, with heads in the same layer often exhibiting similar behaviors. We further show that certain attention heads correspond well to linguistic notions of syntax and coreference. For example, we find heads that attend to the direct objects of verbs, determiners of nouns, objects of prepositions, and coreferent mentions with remarkably high accuracy. Lastly, we propose an attention-based probing classifier and use it to further demonstrate that substantial syntactic information is captured in BERT's attention.

* BlackBoxNLP 2019

Via

Access Paper or Ask Questions