Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sebastian W. Ober

Big Batch Bayesian Active Learning by Considering Predictive Probabilities

Jan 14, 2025

Sebastian W. Ober, Samuel Power, Tom Diethe, Henry B. Moss

Figure 1 for Big Batch Bayesian Active Learning by Considering Predictive Probabilities

Figure 2 for Big Batch Bayesian Active Learning by Considering Predictive Probabilities

Abstract:We observe that BatchBALD, a popular acquisition function for batch Bayesian active learning for classification, can conflate epistemic and aleatoric uncertainty, leading to suboptimal performance. Motivated by this observation, we propose to focus on the predictive probabilities, which only exhibit epistemic uncertainty. The result is an acquisition function that not only performs better, but is also faster to evaluate, allowing for larger batches than before.

* 7 pages, 2 figures; presented as a lightning talk at the NeurIPS Workshop on Bayesian Decision-making and Uncertainty (BDU; 2024)

Via

Access Paper or Ask Questions

Active learning for affinity prediction of antibodies

Jun 11, 2024

Alexandra Gessner, Sebastian W. Ober, Owen Vickery, Dino Oglić, Talip Uçar

Figure 1 for Active learning for affinity prediction of antibodies

Figure 2 for Active learning for affinity prediction of antibodies

Figure 3 for Active learning for affinity prediction of antibodies

Figure 4 for Active learning for affinity prediction of antibodies

Abstract:The primary objective of most lead optimization campaigns is to enhance the binding affinity of ligands. For large molecules such as antibodies, identifying mutations that enhance antibody affinity is particularly challenging due to the combinatorial explosion of potential mutations. When the structure of the antibody-antigen complex is available, relative binding free energy (RBFE) methods can offer valuable insights into how different mutations will impact the potency and selectivity of a drug candidate, thereby reducing the reliance on costly and time-consuming wet-lab experiments. However, accurately simulating the physics of large molecules is computationally intensive. We present an active learning framework that iteratively proposes promising sequences for simulators to evaluate, thereby accelerating the search for improved binders. We explore different modeling approaches to identify the most effective surrogate model for this task, and evaluate our framework both using pre-computed pools of data and in a realistic full-loop setting.

Via

Access Paper or Ask Questions

Recommendations for Baselines and Benchmarking Approximate Gaussian Processes

Feb 15, 2024

Sebastian W. Ober, Artem Artemev, Marcel Wagenländer, Rudolfs Grobins, Mark van der Wilk

Figure 1 for Recommendations for Baselines and Benchmarking Approximate Gaussian Processes

Figure 2 for Recommendations for Baselines and Benchmarking Approximate Gaussian Processes

Figure 3 for Recommendations for Baselines and Benchmarking Approximate Gaussian Processes

Figure 4 for Recommendations for Baselines and Benchmarking Approximate Gaussian Processes

Abstract:Gaussian processes (GPs) are a mature and widely-used component of the ML toolbox. One of their desirable qualities is automatic hyperparameter selection, which allows for training without user intervention. However, in many realistic settings, approximations are typically needed, which typically do require tuning. We argue that this requirement for tuning complicates evaluation, which has led to a lack of a clear recommendations on which method should be used in which situation. To address this, we make recommendations for comparing GP approximations based on a specification of what a user should expect from a method. In addition, we develop a training procedure for the variational method of Titsias [2009] that leaves no choices to the user, and show that this is a strong baseline that meets our specification. We conclude that benchmarking according to our suggestions gives a clearer view of the current state of the field, and uncovers problems that are still open that future papers should address.

* Preprint. 25 pages, 16 figures

Via

Access Paper or Ask Questions

Towards Improved Variational Inference for Deep Bayesian Models

Jan 23, 2024

Sebastian W. Ober

Abstract:Deep learning has revolutionized the last decade, being at the forefront of extraordinary advances in a wide range of tasks including computer vision, natural language processing, and reinforcement learning, to name but a few. However, it is well-known that deep models trained via maximum likelihood estimation tend to be overconfident and give poorly-calibrated predictions. Bayesian deep learning attempts to address this by placing priors on the model parameters, which are then combined with a likelihood to perform posterior inference. Unfortunately, for deep models, the true posterior is intractable, forcing the user to resort to approximations. In this thesis, we explore the use of variational inference (VI) as an approximation, as it is unique in simultaneously approximating the posterior and providing a lower bound to the marginal likelihood. If tight enough, this lower bound can be used to optimize hyperparameters and to facilitate model selection. However, this capacity has rarely been used to its full extent for Bayesian neural networks, likely because the approximate posteriors typically used in practice can lack the flexibility to effectively bound the marginal likelihood. We therefore explore three aspects of Bayesian learning for deep models: 1) we ask whether it is necessary to perform inference over as many parameters as possible, or whether it is reasonable to treat many of them as optimizable hyperparameters; 2) we propose a variational posterior that provides a unified view of inference in Bayesian neural networks and deep Gaussian processes; 3) we demonstrate how VI can be improved in certain deep Gaussian process models by analytically removing symmetries from the posterior, and performing inference on Gram matrices instead of features. We hope that our contributions will provide a stepping stone to fully realize the promises of VI in the future.

* PhD Thesis; University of Cambridge

Via

Access Paper or Ask Questions

Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlow

Feb 16, 2023

Victor Picheny, Joel Berkeley, Henry B. Moss, Hrvoje Stojic, Uri Granta, Sebastian W. Ober, Artem Artemev, Khurram Ghani, Alexander Goodall, Andrei Paleyes(+6 more)

Abstract:We present Trieste, an open-source Python package for Bayesian optimization and active learning benefiting from the scalability and efficiency of TensorFlow. Our library enables the plug-and-play of popular TensorFlow-based models within sequential decision-making loops, e.g. Gaussian processes from GPflow or GPflux, or neural networks from Keras. This modular mindset is central to the package and extends to our acquisition functions and the internal dynamics of the decision-making loop, both of which can be tailored and extended by researchers or engineers when tackling custom use cases. Trieste is a research-friendly and production-ready toolkit backed by a comprehensive test suite, extensive documentation, and available at https://github.com/secondmind-labs/trieste.

Via

Access Paper or Ask Questions

Inducing Point Allocation for Sparse Gaussian Processes in High-Throughput Bayesian Optimisation

Jan 24, 2023

Henry B. Moss, Sebastian W. Ober, Victor Picheny

Abstract:Sparse Gaussian Processes are a key component of high-throughput Bayesian Optimisation (BO) loops; however, we show that existing methods for allocating their inducing points severely hamper optimisation performance. By exploiting the quality-diversity decomposition of Determinantal Point Processes, we propose the first inducing point allocation strategy designed specifically for use in BO. Unlike existing methods which seek only to reduce global uncertainty in the objective function, our approach provides the local high-fidelity modelling of promising regions required for precise optimisation. More generally, we demonstrate that our proposed framework provides a flexible way to allocate modelling capacity in sparse models and so is suitable broad range of downstream sequential decision making tasks.

Via

Access Paper or Ask Questions

Information-theoretic Inducing Point Placement for High-throughput Bayesian Optimisation

Jun 06, 2022

Henry B. Moss, Sebastian W. Ober, Victor Picheny

Figure 1 for Information-theoretic Inducing Point Placement for High-throughput Bayesian Optimisation

Figure 2 for Information-theoretic Inducing Point Placement for High-throughput Bayesian Optimisation

Figure 3 for Information-theoretic Inducing Point Placement for High-throughput Bayesian Optimisation

Abstract:Sparse Gaussian Processes are a key component of high-throughput Bayesian optimisation (BO) loops -- an increasingly common setting where evaluation budgets are large and highly parallelised. By using representative subsets of the available data to build approximate posteriors, sparse models dramatically reduce the computational costs of surrogate modelling by relying on a small set of pseudo-observations, the so-called inducing points, in lieu of the full data set. However, current approaches to design inducing points are not appropriate within BO loops as they seek to reduce global uncertainty in the objective function. Thus, the high-fidelity modelling of promising and data-dense regions required for precise optimisation is sacrificed and computational resources are instead wasted on modelling areas of the space already known to be sub-optimal. Inspired by entropy-based BO methods, we propose a novel inducing point design that uses a principled information-theoretic criterion to select inducing points. By choosing inducing points to maximally reduce both global uncertainty and uncertainty in the maximum value of the objective function, we build surrogate models able to support high-precision high-throughput BO.

Via

Access Paper or Ask Questions

A variational approximate posterior for the deep Wishart process

Jul 21, 2021

Sebastian W. Ober, Laurence Aitchison

Figure 1 for A variational approximate posterior for the deep Wishart process

Figure 2 for A variational approximate posterior for the deep Wishart process

Figure 3 for A variational approximate posterior for the deep Wishart process

Abstract:Recent work introduced deep kernel processes as an entirely kernel-based alternative to NNs (Aitchison et al. 2020). Deep kernel processes flexibly learn good top-layer representations by alternately sampling the kernel from a distribution over positive semi-definite matrices and performing nonlinear transformations. A particular deep kernel process, the deep Wishart process (DWP), is of particular interest because its prior is equivalent to deep Gaussian process (DGP) priors. However, inference in DWPs has not yet been possible due to the lack of sufficiently flexible distributions over positive semi-definite matrices. Here, we give a novel approach to obtaining flexible distributions over positive semi-definite matrices by generalising the Bartlett decomposition of the Wishart probability density. We use this new distribution to develop an approximate posterior for the DWP that includes dependency across layers. We develop a doubly-stochastic inducing-point inference scheme for the DWP and show experimentally that inference in the DWP gives improved performance over doing inference in a DGP with the equivalent prior.

* 20 pages

Via

Access Paper or Ask Questions

Last Layer Marginal Likelihood for Invariance Learning

Jun 14, 2021

Pola Elisabeth Schwöbel, Martin Jørgensen, Sebastian W. Ober, Mark van der Wilk

Figure 1 for Last Layer Marginal Likelihood for Invariance Learning

Figure 2 for Last Layer Marginal Likelihood for Invariance Learning

Figure 3 for Last Layer Marginal Likelihood for Invariance Learning

Figure 4 for Last Layer Marginal Likelihood for Invariance Learning

Abstract:Data augmentation is often used to incorporate inductive biases into models. Traditionally, these are hand-crafted and tuned with cross validation. The Bayesian paradigm for model selection provides a path towards end-to-end learning of invariances using only the training data, by optimising the marginal likelihood. We work towards bringing this approach to neural networks by using an architecture with a Gaussian process in the last layer, a model for which the marginal likelihood can be computed. Experimentally, we improve performance by learning appropriate invariances in standard benchmarks, the low data regime and in a medical imaging task. Optimisation challenges for invariant Deep Kernel Gaussian processes are identified, and a systematic analysis is presented to arrive at a robust training scheme. We introduce a new lower bound to the marginal likelihood, which allows us to perform inference for a larger class of likelihood functions than before, thereby overcoming some of the training challenges that existed with previous approaches.

Via

Access Paper or Ask Questions

The Promises and Pitfalls of Deep Kernel Learning

Feb 24, 2021

Sebastian W. Ober, Carl E. Rasmussen, Mark van der Wilk

Figure 1 for The Promises and Pitfalls of Deep Kernel Learning

Figure 2 for The Promises and Pitfalls of Deep Kernel Learning

Figure 3 for The Promises and Pitfalls of Deep Kernel Learning

Figure 4 for The Promises and Pitfalls of Deep Kernel Learning

Abstract:Deep kernel learning and related techniques promise to combine the representational power of neural networks with the reliable uncertainty estimates of Gaussian processes. One crucial aspect of these models is an expectation that, because they are treated as Gaussian process models optimized using the marginal likelihood, they are protected from overfitting. However, we identify pathological behavior, including overfitting, on a simple toy example. We explore this pathology, explaining its origins and considering how it applies to real datasets. Through careful experimentation on UCI datasets, CIFAR-10, and the UTKFace dataset, we find that the overfitting from overparameterized deep kernel learning, in which the model is "somewhat Bayesian", can in certain scenarios be worse than that from not being Bayesian at all. However, we find that a fully Bayesian treatment of deep kernel learning can rectify this overfitting and obtain the desired performance improvements over standard neural networks and Gaussian processes.

* 18 pages

Via

Access Paper or Ask Questions