Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Javier Gonzalez

RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation

Jun 18, 2025

Xinnuo Xu, Rachel Lawrence, Kshitij Dubey, Atharva Pandey, Risa Ueno, Fabian Falck, Aditya V. Nori, Rahul Sharma, Amit Sharma, Javier Gonzalez

Abstract:Recent Large Language Models (LLMs) have reported high accuracy on reasoning benchmarks. However, it is still unclear whether the observed results arise from true reasoning or from statistical recall of the training set. Inspired by the ladder of causation (Pearl, 2009) and its three levels (associations, interventions and counterfactuals), this paper introduces RE-IMAGINE, a framework to characterize a hierarchy of reasoning ability in LLMs, alongside an automated pipeline to generate problem variations at different levels of the hierarchy. By altering problems in an intermediate symbolic representation, RE-IMAGINE generates arbitrarily many problems that are not solvable using memorization alone. Moreover, the framework is general and can work across reasoning domains, including math, code, and logic. We demonstrate our framework on four widely-used benchmarks to evaluate several families of LLMs, and observe reductions in performance when the models are queried with problem variations. These assessments indicate a degree of reliance on statistical recall for past performance, and open the door to further research targeting skills across the reasoning hierarchy.

* ICML 2025

Via

Access Paper or Ask Questions

Compositional Causal Reasoning Evaluation in Language Models

Mar 06, 2025

Jacqueline R. M. A. Maasch, Alihan Hüyük, Xinnuo Xu, Aditya V. Nori, Javier Gonzalez

Abstract:Causal reasoning and compositional reasoning are two core aspirations in generative AI. Measuring the extent of these behaviors requires principled evaluation methods. We explore a unified perspective that considers both behaviors simultaneously, termed compositional causal reasoning (CCR): the ability to infer how causal measures compose and, equivalently, how causal quantities propagate through graphs. We instantiate a framework for the systematic evaluation of CCR for the average treatment effect and the probability of necessity and sufficiency. As proof of concept, we demonstrate the design of CCR tasks for language models in the LLama, Phi, and GPT families. On a math word problem, our framework revealed a range of taxonomically distinct error patterns. Additionally, CCR errors increased with the complexity of causal paths for all models except o1.

Via

Access Paper or Ask Questions

JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs

Feb 17, 2025

Aliaksandra Shysheya, John Bronskill, James Requeima, Shoaib Ahmed Siddiqui, Javier Gonzalez, David Duvenaud, Richard E. Turner

Abstract:We introduce a simple method for probabilistic predictions on tabular data based on Large Language Models (LLMs) called JoLT (Joint LLM Process for Tabular data). JoLT uses the in-context learning capabilities of LLMs to define joint distributions over tabular data conditioned on user-specified side information about the problem, exploiting the vast repository of latent problem-relevant knowledge encoded in LLMs. JoLT defines joint distributions for multiple target variables with potentially heterogeneous data types without any data conversion, data preprocessing, special handling of missing data, or model training, making it accessible and efficient for practitioners. Our experiments show that JoLT outperforms competitive methods on low-shot single-target and multi-target tabular classification and regression tasks. Furthermore, we show that JoLT can automatically handle missing data and perform data imputation by leveraging textual side information. We argue that due to its simplicity and generality, JoLT is an effective approach for a wide variety of real prediction problems.

Via

Access Paper or Ask Questions

AIRIVA: A Deep Generative Model of Adaptive Immune Repertoires

Apr 26, 2023

Melanie F. Pradier, Niranjani Prasad, Paidamoyo Chapfuwa, Sahra Ghalebikesabi, Max Ilse, Steven Woodhouse, Rebecca Elyanow, Javier Zazo, Javier Gonzalez, Julia Greissl(+1 more)

Figure 1 for AIRIVA: A Deep Generative Model of Adaptive Immune Repertoires

Figure 2 for AIRIVA: A Deep Generative Model of Adaptive Immune Repertoires

Figure 3 for AIRIVA: A Deep Generative Model of Adaptive Immune Repertoires

Figure 4 for AIRIVA: A Deep Generative Model of Adaptive Immune Repertoires

Abstract:Recent advances in immunomics have shown that T-cell receptor (TCR) signatures can accurately predict active or recent infection by leveraging the high specificity of TCR binding to disease antigens. However, the extreme diversity of the adaptive immune repertoire presents challenges in reliably identifying disease-specific TCRs. Population genetics and sequencing depth can also have strong systematic effects on repertoires, which requires careful consideration when developing diagnostic models. We present an Adaptive Immune Repertoire-Invariant Variational Autoencoder (AIRIVA), a generative model that learns a low-dimensional, interpretable, and compositional representation of TCR repertoires to disentangle such systematic effects in repertoires. We apply AIRIVA to two infectious disease case-studies: COVID-19 (natural infection and vaccination) and the Herpes Simplex Virus (HSV-1 and HSV-2), and empirically show that we can disentangle the individual disease signals. We further demonstrate AIRIVA's capability to: learn from unlabelled samples; generate in-silico TCR repertoires by intervening on the latent factors; and identify disease-associated TCRs validated using TCR annotations from external assay data.

Via

Access Paper or Ask Questions

Invariant Priors for Bayesian Quadrature

Dec 02, 2021

Masha Naslidnyk, Javier Gonzalez, Maren Mahsereci

Figure 1 for Invariant Priors for Bayesian Quadrature

Figure 2 for Invariant Priors for Bayesian Quadrature

Figure 3 for Invariant Priors for Bayesian Quadrature

Figure 4 for Invariant Priors for Bayesian Quadrature

Abstract:Bayesian quadrature (BQ) is a model-based numerical integration method that is able to increase sample efficiency by encoding and leveraging known structure of the integration task at hand. In this paper, we explore priors that encode invariance of the integrand under a set of bijective transformations in the input domain, in particular some unitary transformations, such as rotations, axis-flips, or point symmetries. We show initial results on superior performance in comparison to standard Bayesian quadrature on several synthetic and one real world application.

Via

Access Paper or Ask Questions

Emulation of physical processes with Emukit

Oct 25, 2021

Andrei Paleyes, Mark Pullin, Maren Mahsereci, Cliff McCollum, Neil D. Lawrence, Javier Gonzalez

Figure 1 for Emulation of physical processes with Emukit

Figure 2 for Emulation of physical processes with Emukit

Figure 3 for Emulation of physical processes with Emukit

Figure 4 for Emulation of physical processes with Emukit

Abstract:Decision making in uncertain scenarios is an ubiquitous challenge in real world systems. Tools to deal with this challenge include simulations to gather information and statistical emulation to quantify uncertainty. The machine learning community has developed a number of methods to facilitate decision making, but so far they are scattered in multiple different toolkits, and generally rely on a fixed backend. In this paper, we present Emukit, a highly adaptable Python toolkit for enriching decision making under uncertainty. Emukit allows users to: (i) use state of the art methods including Bayesian optimization, multi-fidelity emulation, experimental design, Bayesian quadrature and sensitivity analysis; (ii) easily prototype new decision making methods for new problems. Emukit is agnostic to the underlying modeling framework and enables users to use their own custom models. We show how Emukit can be used on three exemplary case studies.

* Second Workshop on Machine Learning and the Physical Sciences, NeurIPS 2019

Via

Access Paper or Ask Questions

RKHS-SHAP: Shapley Values for Kernel Methods

Oct 18, 2021

Siu Lun Chau, Javier Gonzalez, Dino Sejdinovic

Figure 1 for RKHS-SHAP: Shapley Values for Kernel Methods

Figure 2 for RKHS-SHAP: Shapley Values for Kernel Methods

Figure 3 for RKHS-SHAP: Shapley Values for Kernel Methods

Figure 4 for RKHS-SHAP: Shapley Values for Kernel Methods

Abstract:Feature attribution for kernel methods is often heuristic and not individualised for each prediction. To address this, we turn to the concept of Shapley values, a coalition game theoretical framework that has previously been applied to different machine learning model interpretation tasks, such as linear models, tree ensembles and deep networks. By analysing Shapley values from a functional perspective, we propose \textsc{RKHS-SHAP}, an attribution method for kernel machines that can efficiently compute both \emph{Interventional} and \emph{Observational Shapley values} using kernel mean embeddings of distributions. We show theoretically that our method is robust with respect to local perturbations - a key yet often overlooked desideratum for interpretability. Further, we propose \emph{Shapley regulariser}, applicable to a general empirical risk minimisation framework, allowing learning while controlling the level of specific feature's contributions to the model. We demonstrate that the Shapley regulariser enables learning which is robust to covariate shift of a given feature and fair learning which controls the Shapley values of sensitive features.

* 11 pages, 4 figures

Via

Access Paper or Ask Questions

GIBBON: General-purpose Information-Based Bayesian OptimisatioN

Feb 05, 2021

Henry B. Moss, David S. Leslie, Javier Gonzalez, Paul Rayson

Figure 1 for GIBBON: General-purpose Information-Based Bayesian OptimisatioN

Figure 2 for GIBBON: General-purpose Information-Based Bayesian OptimisatioN

Figure 3 for GIBBON: General-purpose Information-Based Bayesian OptimisatioN

Figure 4 for GIBBON: General-purpose Information-Based Bayesian OptimisatioN

Abstract:This paper describes a general-purpose extension of max-value entropy search, a popular approach for Bayesian Optimisation (BO). A novel approximation is proposed for the information gain -- an information-theoretic quantity central to solving a range of BO problems, including noisy, multi-fidelity and batch optimisations across both continuous and highly-structured discrete spaces. Previously, these problems have been tackled separately within information-theoretic BO, each requiring a different sophisticated approximation scheme, except for batch BO, for which no computationally-lightweight information-theoretic approach has previously been proposed. GIBBON (General-purpose Information-Based Bayesian OptimisatioN) provides a single principled framework suitable for all the above, out-performing existing approaches whilst incurring substantially lower computational overheads. In addition, GIBBON does not require the problem's search space to be Euclidean and so is the first high-performance yet computationally light-weight acquisition function that supports batch BO over general highly structured input spaces like molecular search and gene design. Moreover, our principled derivation of GIBBON yields a natural interpretation of a popular batch BO heuristic based on determinantal point processes. Finally, we analyse GIBBON across a suite of synthetic benchmark tasks, a molecular search loop, and as part of a challenging batch multi-fidelity framework for problems with controllable experimental noise.

Via

Access Paper or Ask Questions

Good practices for Bayesian Optimization of high dimensional structured spaces

Jan 06, 2021

Eero Siivola, Javier Gonzalez, Andrei Paleyes, Aki Vehtari

Figure 1 for Good practices for Bayesian Optimization of high dimensional structured spaces

Figure 2 for Good practices for Bayesian Optimization of high dimensional structured spaces

Figure 3 for Good practices for Bayesian Optimization of high dimensional structured spaces

Figure 4 for Good practices for Bayesian Optimization of high dimensional structured spaces

Abstract:The increasing availability of structured but high dimensional data has opened new opportunities for optimization. One emerging and promising avenue is the exploration of unsupervised methods for projecting structured high dimensional data into low dimensional continuous representations, simplifying the optimization problem and enabling the application of traditional optimization methods. However, this line of research has been purely methodological with little connection to the needs of practitioners so far. In this paper, we study the effect of different search space design choices for performing Bayesian Optimization in high dimensional structured datasets. In particular, we analyse the influence of the dimensionality of the latent space, the role of the acquisition function and evaluate new methods to automatically define the optimization bounds in the latent space. Finally, based on experimental results using synthetic and real datasets, we provide recommendations for the practitioners.

Via

Access Paper or Ask Questions

BOSS: Bayesian Optimization over String Spaces

Oct 02, 2020

Henry B. Moss, Daniel Beck, Javier Gonzalez, David S. Leslie, Paul Rayson

Figure 1 for BOSS: Bayesian Optimization over String Spaces

Figure 2 for BOSS: Bayesian Optimization over String Spaces

Figure 3 for BOSS: Bayesian Optimization over String Spaces

Figure 4 for BOSS: Bayesian Optimization over String Spaces

Abstract:This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops. Recent applications of BO over strings have been hindered by the need to map inputs into a smooth and unconstrained latent space. Learning this projection is computationally and data-intensive. Our approach instead builds a powerful Gaussian process surrogate model based on string kernels, naturally supporting variable length inputs, and performs efficient acquisition function maximization for spaces with syntactical constraints. Experiments demonstrate considerably improved optimization over existing approaches across a broad range of constraints, including the popular setting where syntax is governed by a context-free grammar.

Via

Access Paper or Ask Questions