Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonathan Gordon

OpenAI o1 System Card

Dec 21, 2024

OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry(+253 more)

Abstract:The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

Via

Access Paper or Ask Questions

Evolution through Large Models

Jun 17, 2022

Joel Lehman, Jonathan Gordon, Shawn Jain, Kamal Ndousse, Cathy Yeh, Kenneth O. Stanley

Figure 1 for Evolution through Large Models

Figure 2 for Evolution through Large Models

Figure 3 for Evolution through Large Models

Figure 4 for Evolution through Large Models

Abstract:This paper pursues the insight that large language models (LLMs) trained to generate code can vastly improve the effectiveness of mutation operators applied to programs in genetic programming (GP). Because such LLMs benefit from training data that includes sequential changes and modifications, they can approximate likely changes that humans would make. To highlight the breadth of implications of such evolution through large models (ELM), in the main experiment ELM combined with MAP-Elites generates hundreds of thousands of functional examples of Python programs that output working ambulating robots in the Sodarace domain, which the original LLM had never seen in pre-training. These examples then help to bootstrap training a new conditional language model that can output the right walker for a particular terrain. The ability to bootstrap new models that can output appropriate artifacts for a given context in a domain where zero training data was previously available carries implications for open-endedness, deep learning, and reinforcement learning. These implications are explored here in depth in the hope of inspiring new directions of research now opened up by ELM.

Via

Access Paper or Ask Questions

The Gaussian Neural Process

Jan 10, 2021

Wessel P. Bruinsma, James Requeima, Andrew Y. K. Foong, Jonathan Gordon, Richard E. Turner

Figure 1 for The Gaussian Neural Process

Figure 2 for The Gaussian Neural Process

Figure 3 for The Gaussian Neural Process

Abstract:Neural Processes (NPs; Garnelo et al., 2018a,b) are a rich class of models for meta-learning that map data sets directly to predictive stochastic processes. We provide a rigorous analysis of the standard maximum-likelihood objective used to train conditional NPs. Moreover, we propose a new member to the Neural Process family called the Gaussian Neural Process (GNP), which models predictive correlations, incorporates translation equivariance, provides universal approximation guarantees, and demonstrates encouraging performance.

* 34 pages; includes supplementary material; to appear in AABI 2020

Via

Access Paper or Ask Questions

Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

Jul 02, 2020

Andrew Y. K. Foong, Wessel P. Bruinsma, Jonathan Gordon, Yann Dubois, James Requeima, Richard E. Turner

Figure 1 for Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

Figure 2 for Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

Figure 3 for Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

Figure 4 for Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

Abstract:Stationary stochastic processes (SPs) are a key component of many probabilistic models, such as those for off-the-grid spatio-temporal data. They enable the statistical symmetry of underlying physical phenomena to be leveraged, thereby aiding generalization. Prediction in such models can be viewed as a translation equivariant map from observed data sets to predictive SPs, emphasizing the intimate relationship between stationarity and equivariance. Building on this, we propose the Convolutional Neural Process (ConvNP), which endows Neural Processes (NPs) with translation equivariance and extends convolutional conditional NPs to allow for dependencies in the predictive distribution. The latter enables ConvNPs to be deployed in settings which require coherent samples, such as Thompson sampling or conditional image completion. Moreover, we propose a new maximum-likelihood objective to replace the standard ELBO objective in NPs, which conceptually simplifies the framework and empirically improves performance. We demonstrate the strong performance and generalization capabilities of ConvNPs on 1D regression, image completion, and various tasks with real-world spatio-temporal data.

Via

Access Paper or Ask Questions

Predictive Complexity Priors

Jul 01, 2020

Eric Nalisnick, Jonathan Gordon, José Miguel Hernández-Lobato

Figure 1 for Predictive Complexity Priors

Figure 2 for Predictive Complexity Priors

Figure 3 for Predictive Complexity Priors

Figure 4 for Predictive Complexity Priors

Abstract:Specifying a Bayesian prior is notoriously difficult for complex models such as neural networks. Reasoning about parameters is made challenging by the high-dimensionality and over-parameterization of the space. Priors that seem benign and uninformative can have unintuitive and detrimental effects on a model's predictions. For this reason, we propose predictive complexity priors: a functional prior that is defined by comparing the model's predictions to those of a reference function. Although originally defined on the model outputs, we transfer the prior to the model parameters via a change of variables. The traditional Bayesian workflow can then proceed as usual. We apply our predictive complexity prior to modern machine learning tasks such as reasoning over neural network depth and sharing of statistical strength for few-shot learning.

* 22 pages

Via

Access Paper or Ask Questions

TaskNorm: Rethinking Batch Normalization for Meta-Learning

Mar 06, 2020

John Bronskill, Jonathan Gordon, James Requeima, Sebastian Nowozin, Richard E. Turner

Figure 1 for TaskNorm: Rethinking Batch Normalization for Meta-Learning

Figure 2 for TaskNorm: Rethinking Batch Normalization for Meta-Learning

Figure 3 for TaskNorm: Rethinking Batch Normalization for Meta-Learning

Figure 4 for TaskNorm: Rethinking Batch Normalization for Meta-Learning

Abstract:Modern meta-learning approaches for image classification rely on increasingly deep networks to achieve state-of-the-art performance, making batch normalization an essential component of meta-learning pipelines. However, the hierarchical nature of the meta-learning setting presents several challenges that can render conventional batch normalization ineffective, giving rise to the need to rethink normalization in this setting. We evaluate a range of approaches to batch normalization for meta-learning scenarios, and develop a novel approach that we call TaskNorm. Experiments on fourteen datasets demonstrate that the choice of batch normalization has a dramatic effect on both classification accuracy and training time for both gradient based and gradient-free meta-learning approaches. Importantly, TaskNorm is found to consistently improve performance. Finally, we provide a set of best practices for normalization that will allow fair comparison of meta-learning algorithms.

Via

Access Paper or Ask Questions

Convolutional Conditional Neural Processes

Oct 29, 2019

Jonathan Gordon, Wessel P. Bruinsma, Andrew Y. K. Foong, James Requeima, Yann Dubois, Richard E. Turner

Figure 1 for Convolutional Conditional Neural Processes

Figure 2 for Convolutional Conditional Neural Processes

Figure 3 for Convolutional Conditional Neural Processes

Figure 4 for Convolutional Conditional Neural Processes

Abstract:We introduce the Convolutional Conditional Neural Process (ConvCNP), a new member of the Neural Process family that models translation equivariance in the data. Translation equivariance is an important inductive bias for many learning problems including time series modelling, spatial data, and images. The model embeds data sets into an infinite-dimensional function space as opposed to a finite-dimensional vector space. To formalize this notion, we extend the theory of neural representations of sets to include functional representations, and demonstrate that any translation-equivariant embedding can be represented using a convolutional deep set. We evaluate ConvCNPs in several settings, demonstrating that they achieve state-of-the-art performance compared to existing NPs. We demonstrate that building in translation equivariance enables zero-shot generalization to challenging, out-of-domain tasks.

Via

Access Paper or Ask Questions

Bayesian Batch Active Learning as Sparse Subset Approximation

Aug 06, 2019

Robert Pinsler, Jonathan Gordon, Eric Nalisnick, José Miguel Hernández-Lobato

Figure 1 for Bayesian Batch Active Learning as Sparse Subset Approximation

Figure 2 for Bayesian Batch Active Learning as Sparse Subset Approximation

Figure 3 for Bayesian Batch Active Learning as Sparse Subset Approximation

Figure 4 for Bayesian Batch Active Learning as Sparse Subset Approximation

Abstract:Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the most informative data points to be labeled. However, for many large-scale problems standard greedy procedures become computationally infeasible and suffer from negligible model change. In this paper, we introduce a novel Bayesian batch active learning approach that mitigates these issues. Our approach is motivated by approximating the complete data posterior of the model parameters. While naive batch construction methods result in correlated queries, our algorithm produces diverse batches that enable efficient active learning at scale. We derive interpretable closed-form solutions akin to existing active learning procedures for linear models, and generalize to arbitrary models using random projections. We demonstrate the benefits of our approach on several large-scale regression and classification tasks.

Via

Access Paper or Ask Questions

Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes

Jun 18, 2019

James Requeima, Jonathan Gordon, John Bronskill, Sebastian Nowozin, Richard E. Turner

Figure 1 for Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes

Figure 2 for Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes

Figure 3 for Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes

Figure 4 for Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes

Abstract:The goal of this paper is to design image classification systems that, after an initial multi-task training phase, can automatically adapt to new tasks encountered at test time. We introduce a conditional neural process based approach to the multi-task classification setting for this purpose, and establish connections to the meta-learning and few-shot learning literature. The resulting approach, called CNAPs, comprises a classifier whose parameters are modulated by an adaptation network that takes the current task's dataset as input. We demonstrate that CNAPs achieves state-of-the-art results on the challenging Meta-Dataset benchmark indicating high-quality transfer-learning. We show that the approach is robust, avoiding both over-fitting in low-shot regimes and under-fitting in high-shot regimes. Timing experiments reveal that CNAPs is computationally efficient at test-time as it does not involve gradient based adaptation. Finally, we show that trained models are immediately deployable to continual learning and active learning where they can outperform existing approaches that do not leverage transfer learning.

Via

Access Paper or Ask Questions

Probabilistic Neural Architecture Search

Feb 13, 2019

Francesco Paolo Casale, Jonathan Gordon, Nicolo Fusi

Figure 1 for Probabilistic Neural Architecture Search

Figure 2 for Probabilistic Neural Architecture Search

Figure 3 for Probabilistic Neural Architecture Search

Figure 4 for Probabilistic Neural Architecture Search

Abstract:In neural architecture search (NAS), the space of neural network architectures is automatically explored to maximize predictive accuracy for a given task. Despite the success of recent approaches, most existing methods cannot be directly applied to large scale problems because of their prohibitive computational complexity or high memory usage. In this work, we propose a Probabilistic approach to neural ARchitecture SEarCh (PARSEC) that drastically reduces memory requirements while maintaining state-of-the-art computational complexity, making it possible to directly search over more complex architectures and larger datasets. Our approach only requires as much memory as is needed to train a single architecture from our search space. This is due to a memory-efficient sampling procedure wherein we learn a probability distribution over high-performing neural network architectures. Importantly, this framework enables us to transfer the distribution of architectures learnt on smaller problems to larger ones, further reducing the computational cost. We showcase the advantages of our approach in applications to CIFAR-10 and ImageNet, where our approach outperforms methods with double its computational cost and matches the performance of methods with costs that are three orders of magnitude larger.

Via

Access Paper or Ask Questions