Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gregory W. Benton

Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

Feb 25, 2021

Gregory W. Benton, Wesley J. Maddox, Sanae Lotfi, Andrew Gordon Wilson

Figure 1 for Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

Figure 2 for Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

Figure 3 for Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

Figure 4 for Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

Abstract:With a better understanding of the loss surfaces for multilayer networks, we can build more robust and accurate training procedures. Recently it was discovered that independently trained SGD solutions can be connected along one-dimensional paths of near-constant training loss. In this paper, we show that there are mode-connecting simplicial complexes that form multi-dimensional manifolds of low loss, connecting many independently trained models. Inspired by this discovery, we show how to efficiently build simplicial complexes for fast ensembling, outperforming independently trained deep ensembles in accuracy, calibration, and robustness to dataset shift. Notably, our approach only requires a few training epochs to discover a low-loss simplex, starting from a pre-trained solution. Code is available at https://github.com/g-benton/loss-surface-simplexes.

Via

Access Paper or Ask Questions

Function-Space Distributions over Kernels

Oct 29, 2019

Gregory W. Benton, Wesley J. Maddox, Jayson P. Salkey, Julio Albinati, Andrew Gordon Wilson

Figure 1 for Function-Space Distributions over Kernels

Figure 2 for Function-Space Distributions over Kernels

Figure 3 for Function-Space Distributions over Kernels

Figure 4 for Function-Space Distributions over Kernels

Abstract:Gaussian processes are flexible function approximators, with inductive biases controlled by a covariance kernel. Learning the kernel is the key to representation learning and strong predictive performance. In this paper, we develop functional kernel learning (FKL) to directly infer functional posteriors over kernels. In particular, we place a transformed Gaussian process over a spectral density, to induce a non-parametric distribution over kernel functions. The resulting approach enables learning of rich representations, with support for any stationary kernel, uncertainty over the values of the kernel, and an interpretable specification of a prior directly over kernels, without requiring sophisticated initialization or manual intervention. We perform inference through elliptical slice sampling, which is especially well suited to marginalizing posteriors with the strongly correlated priors typical to function space modelling. We develop our approach for non-uniform, large-scale, multi-task, and multidimensional data, and show promising performance in a wide range of settings, including interpolation, extrapolation, and kernel recovery experiments.

* Published at NeurIPS 2019

Via

Access Paper or Ask Questions