Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ishaan Gulrajani

Tony

GPT-4o System Card

Oct 25, 2024

OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda(+409 more)

Abstract:GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities.

Via

Access Paper or Ask Questions

Likelihood-Based Diffusion Language Models

May 30, 2023

Ishaan Gulrajani, Tatsunori B. Hashimoto

Abstract:Despite a growing interest in diffusion-based language models, existing work has not shown that these models can attain nontrivial likelihoods on standard language modeling benchmarks. In this work, we take the first steps towards closing the likelihood gap between autoregressive and diffusion-based language models, with the goal of building and releasing a diffusion model which outperforms a small but widely-known autoregressive model. We pursue this goal through algorithmic improvements, scaling laws, and increased compute. On the algorithmic front, we introduce several methodological improvements for the maximum-likelihood training of diffusion language models. We then study scaling laws for our diffusion models and find compute-optimal training regimes which differ substantially from autoregressive models. Using our methods and scaling analysis, we train and release Plaid 1B, a large diffusion language model which outperforms GPT-2 124M in likelihood on benchmark datasets and generates fluent samples in unconditional and zero-shot control settings.

Via

Access Paper or Ask Questions

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

May 22, 2023

Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto

Figure 1 for AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Figure 2 for AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Figure 3 for AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Figure 4 for AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Abstract:Large language models (LLMs) such as ChatGPT have seen widespread adoption due to their ability to follow user instructions well. Developing these LLMs involves a complex yet poorly understood workflow requiring training with human feedback. Replicating and understanding this instruction-following process faces three major challenges: the high cost of data collection, the lack of trustworthy evaluation, and the absence of reference method implementations. We address these challenges with AlpacaFarm, a simulator that enables research and development for learning from feedback at a low cost. First, we design LLM prompts to simulate human feedback that are 45x cheaper than crowdworkers and display high agreement with humans. Second, we propose an automatic evaluation and validate it against human instructions obtained on real-world interactions. Third, we contribute reference implementations for several methods (PPO, best-of-n, expert iteration, and more) that learn from pairwise feedback. Finally, as an end-to-end validation of AlpacaFarm, we train and evaluate eleven models on 10k pairs of real human feedback and show that rankings of models trained in AlpacaFarm match rankings of models trained on human data. As a demonstration of the research possible in AlpacaFarm, we find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a +10% improvement in win-rate against Davinci003. We release all components of AlpacaFarm at https://github.com/tatsu-lab/alpaca_farm.

Via

Access Paper or Ask Questions

Diffusion-LM Improves Controllable Text Generation

May 27, 2022

Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, Tatsunori B. Hashimoto

Figure 1 for Diffusion-LM Improves Controllable Text Generation

Figure 2 for Diffusion-LM Improves Controllable Text Generation

Figure 3 for Diffusion-LM Improves Controllable Text Generation

Figure 4 for Diffusion-LM Improves Controllable Text Generation

Abstract:Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM. Building upon the recent successes of diffusion models in continuous domains, Diffusion-LM iteratively denoises a sequence of Gaussian vectors into word vectors, yielding a sequence of intermediate latent variables. The continuous, hierarchical nature of these intermediate variables enables a simple gradient-based algorithm to perform complex, controllable generation tasks. We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work.

Via

Access Paper or Ask Questions

In Search of Lost Domain Generalization

Jul 02, 2020

Ishaan Gulrajani, David Lopez-Paz

Figure 1 for In Search of Lost Domain Generalization

Figure 2 for In Search of Lost Domain Generalization

Figure 3 for In Search of Lost Domain Generalization

Figure 4 for In Search of Lost Domain Generalization

Abstract:The goal of domain generalization algorithms is to predict well on distributions different from those seen during training. While a myriad of domain generalization algorithms exist, inconsistencies in experimental conditions -- datasets, architectures, and model selection criteria -- render fair and realistic comparisons difficult. In this paper, we are interested in understanding how useful domain generalization algorithms are in realistic settings. As a first step, we realize that model selection is non-trivial for domain generalization tasks. Contrary to prior work, we argue that domain generalization algorithms without a model selection strategy should be regarded as incomplete. Next, we implement DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria. We conduct extensive experiments using DomainBed and find that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets. Looking forward, we hope that the release of DomainBed, along with contributions from fellow researchers, will streamline reproducible and rigorous research in domain generalization.

Via

Access Paper or Ask Questions

Towards GAN Benchmarks Which Require Generalization

Jan 10, 2020

Ishaan Gulrajani, Colin Raffel, Luke Metz

Figure 1 for Towards GAN Benchmarks Which Require Generalization

Figure 2 for Towards GAN Benchmarks Which Require Generalization

Figure 3 for Towards GAN Benchmarks Which Require Generalization

Figure 4 for Towards GAN Benchmarks Which Require Generalization

Abstract:For many evaluation metrics commonly used as benchmarks for unconditional image generation, trivially memorizing the training set attains a better score than models which are considered state-of-the-art; we consider this problematic. We clarify a necessary condition for an evaluation metric not to behave this way: estimating the function must require a large sample from the model. In search of such a metric, we turn to neural network divergences (NNDs), which are defined in terms of a neural network trained to distinguish between distributions. The resulting benchmarks cannot be "won" by training set memorization, while still being perceptually correlated and computable only from samples. We survey past work on using NNDs for evaluation and implement an example black-box metric based on these ideas. Through experimental validation we show that it can effectively measure diversity, sample quality, and generalization.

* ICLR 2019 conference paper

Via

Access Paper or Ask Questions

Invariant Risk Minimization

Jul 05, 2019

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, David Lopez-Paz

Figure 1 for Invariant Risk Minimization

Figure 2 for Invariant Risk Minimization

Figure 3 for Invariant Risk Minimization

Figure 4 for Invariant Risk Minimization

Abstract:We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.

Via

Access Paper or Ask Questions

GANSynth: Adversarial Neural Audio Synthesis

Apr 15, 2019

Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, Adam Roberts

Figure 1 for GANSynth: Adversarial Neural Audio Synthesis

Figure 2 for GANSynth: Adversarial Neural Audio Synthesis

Figure 3 for GANSynth: Adversarial Neural Audio Synthesis

Figure 4 for GANSynth: Adversarial Neural Audio Synthesis

Abstract:Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence. Autoregressive models, such as WaveNet, model local structure at the expense of global latent structure and slow iterative sampling, while Generative Adversarial Networks (GANs), have global latent conditioning and efficient parallel sampling, but struggle to generate locally-coherent audio waveforms. Herein, we demonstrate that GANs can in fact generate high-fidelity and locally-coherent audio by modeling log magnitudes and instantaneous frequencies with sufficient frequency resolution in the spectral domain. Through extensive empirical investigations on the NSynth dataset, we demonstrate that GANs are able to outperform strong WaveNet baselines on automated and human evaluation metrics, and efficiently generate audio several orders of magnitude faster than their autoregressive counterparts.

* Colab Notebook: http://goo.gl/magenta/gansynth-demo

Via

Access Paper or Ask Questions

Improved Training of Wasserstein GANs

Dec 25, 2017

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville

Figure 1 for Improved Training of Wasserstein GANs

Figure 2 for Improved Training of Wasserstein GANs

Figure 3 for Improved Training of Wasserstein GANs

Figure 4 for Improved Training of Wasserstein GANs

Abstract:Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

* NIPS camera-ready

Via

Access Paper or Ask Questions

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Feb 11, 2017

Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, Yoshua Bengio

Figure 1 for SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Figure 2 for SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Figure 3 for SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Figure 4 for SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Abstract:In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.

* Published as a conference paper at ICLR 2017

Via

Access Paper or Ask Questions