Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Francisco J. R. Ruiz

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Jun 16, 2025

Alexander Novikov, Ngân Vũ, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian(+8 more)

Abstract:In this white paper, we present AlphaEvolve, an evolutionary coding agent that substantially enhances capabilities of state-of-the-art LLMs on highly challenging tasks such as tackling open scientific problems or optimizing critical pieces of computational infrastructure. AlphaEvolve orchestrates an autonomous pipeline of LLMs, whose task is to improve an algorithm by making direct changes to the code. Using an evolutionary approach, continuously receiving feedback from one or more evaluators, AlphaEvolve iteratively improves the algorithm, potentially leading to new scientific and practical discoveries. We demonstrate the broad applicability of this approach by applying it to a number of important computational problems. When applied to optimizing critical components of large-scale computational stacks at Google, AlphaEvolve developed a more efficient scheduling algorithm for data centers, found a functionally equivalent simplification in the circuit design of hardware accelerators, and accelerated the training of the LLM underpinning AlphaEvolve itself. Furthermore, AlphaEvolve discovered novel, provably correct algorithms that surpass state-of-the-art solutions on a spectrum of problems in mathematics and computer science, significantly expanding the scope of prior automated discovery methods (Romera-Paredes et al., 2023). Notably, AlphaEvolve developed a search algorithm that found a procedure to multiply two $4 \times 4$ complex-valued matrices using $48$ scalar multiplications; offering the first improvement, after 56 years, over Strassen's algorithm in this setting. We believe AlphaEvolve and coding agents like it can have a significant impact in improving solutions of problems across many areas of science and computation.

Via

Access Paper or Ask Questions

FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Jun 07, 2024

Virginia Aglietti, Ira Ktena, Jessica Schrouff, Eleni Sgouritsa, Francisco J. R. Ruiz, Alexis Bellot, Silvia Chiappa

Figure 1 for FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Figure 2 for FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Figure 3 for FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Figure 4 for FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Abstract:The sample efficiency of Bayesian optimization algorithms depends on carefully crafted acquisition functions (AFs) guiding the sequential collection of function evaluations. The best-performing AF can vary significantly across optimization problems, often requiring ad-hoc and problem-specific choices. This work tackles the challenge of designing novel AFs that perform well across a variety of experimental settings. Based on FunSearch, a recent work using Large Language Models (LLMs) for discovery in mathematical sciences, we propose FunBO, an LLM-based method that can be used to learn new AFs written in computer code by leveraging access to a limited number of evaluations for a set of objective functions. We provide the analytic expression of all discovered AFs and evaluate them on various global optimization benchmarks and hyperparameter optimization tasks. We show how FunBO identifies AFs that generalize well in and out of the training distribution of functions, thus outperforming established general-purpose AFs and achieving competitive performance against AFs that are customized to specific function types and are learned via transfer-learning algorithms.

Via

Access Paper or Ask Questions

Quantum Circuit Optimization with AlphaTensor

Mar 05, 2024

Francisco J. R. Ruiz, Tuomas Laakkonen, Johannes Bausch, Matej Balog, Mohammadamin Barekatain, Francisco J. H. Heras, Alexander Novikov, Nathan Fitzpatrick, Bernardino Romera-Paredes, John van de Wetering(+3 more)

Figure 1 for Quantum Circuit Optimization with AlphaTensor

Figure 2 for Quantum Circuit Optimization with AlphaTensor

Figure 3 for Quantum Circuit Optimization with AlphaTensor

Figure 4 for Quantum Circuit Optimization with AlphaTensor

Abstract:A key challenge in realizing fault-tolerant quantum computers is circuit optimization. Focusing on the most expensive gates in fault-tolerant quantum computation (namely, the T gates), we address the problem of T-count optimization, i.e., minimizing the number of T gates that are needed to implement a given circuit. To achieve this, we develop AlphaTensor-Quantum, a method based on deep reinforcement learning that exploits the relationship between optimizing T-count and tensor decomposition. Unlike existing methods for T-count optimization, AlphaTensor-Quantum can incorporate domain-specific knowledge about quantum computation and leverage gadgets, which significantly reduces the T-count of the optimized circuits. AlphaTensor-Quantum outperforms the existing methods for T-count optimization on a set of arithmetic benchmarks (even when compared without making use of gadgets). Remarkably, it discovers an efficient algorithm akin to Karatsuba's method for multiplication in finite fields. AlphaTensor-Quantum also finds the best human-designed solutions for relevant arithmetic computations used in Shor's algorithm and for quantum chemistry simulation, thus demonstrating it can save hundreds of hours of research by optimizing relevant quantum circuits in a fully automated way.

* 25 pages main paper + 19 pages appendix

Via

Access Paper or Ask Questions

Order Matters: Probabilistic Modeling of Node Sequence for Graph Generation

Jun 14, 2021

Xiaohui Chen, Xu Han, Jiajing Hu, Francisco J. R. Ruiz, Liping Liu

Figure 1 for Order Matters: Probabilistic Modeling of Node Sequence for Graph Generation

Figure 2 for Order Matters: Probabilistic Modeling of Node Sequence for Graph Generation

Figure 3 for Order Matters: Probabilistic Modeling of Node Sequence for Graph Generation

Figure 4 for Order Matters: Probabilistic Modeling of Node Sequence for Graph Generation

Abstract:A graph generative model defines a distribution over graphs. One type of generative model is constructed by autoregressive neural networks, which sequentially add nodes and edges to generate a graph. However, the likelihood of a graph under the autoregressive model is intractable, as there are numerous sequences leading to the given graph; this makes maximum likelihood estimation challenging. Instead, in this work we derive the exact joint probability over the graph and the node ordering of the sequential process. From the joint, we approximately marginalize out the node orderings and compute a lower bound on the log-likelihood using variational inference. We train graph generative models by maximizing this bound, without using the ad-hoc node orderings of previous methods. Our experiments show that the log-likelihood bound is significantly tighter than the bound of previous schemes. Moreover, the models fitted with the proposed algorithm can generate high-quality graphs that match the structures of target graphs not seen during training. We have made our code publicly available at \hyperref[https://github.com/tufts-ml/graph-generation-vi]{https://github.com/tufts-ml/graph-generation-vi}.

* ICML 2021

Via

Access Paper or Ask Questions

VarGrad: A Low-Variance Gradient Estimator for Variational Inference

Oct 29, 2020

Lorenz Richter, Ayman Boustati, Nikolas Nüsken, Francisco J. R. Ruiz, Ömer Deniz Akyildiz

Figure 1 for VarGrad: A Low-Variance Gradient Estimator for Variational Inference

Figure 2 for VarGrad: A Low-Variance Gradient Estimator for Variational Inference

Figure 3 for VarGrad: A Low-Variance Gradient Estimator for Variational Inference

Figure 4 for VarGrad: A Low-Variance Gradient Estimator for Variational Inference

Abstract:We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under certain conditions, the gradient of the log-variance loss equals the gradient of the (negative) ELBO. We show theoretically that this gradient estimator, which we call $\textit{VarGrad}$ due to its connection to the log-variance loss, exhibits lower variance than the score function method in certain settings, and that the leave-one-out control variate coefficients are close to the optimal ones. We empirically demonstrate that VarGrad offers a favourable variance versus computation trade-off compared to other state-of-the-art estimators on a discrete VAE.

Via

Access Paper or Ask Questions

Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled Markov Chains

Oct 05, 2020

Francisco J. R. Ruiz, Michalis K. Titsias, Taylan Cemgil, Arnaud Doucet

Figure 1 for Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled Markov Chains

Figure 2 for Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled Markov Chains

Figure 3 for Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled Markov Chains

Figure 4 for Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled Markov Chains

Abstract:The variational auto-encoder (VAE) is a deep latent variable model that has two neural networks in an autoencoder-like architecture; one of them parameterizes the model's likelihood. Fitting its parameters via maximum likelihood is challenging since the computation of the likelihood involves an intractable integral over the latent space; thus the VAE is trained instead by maximizing a variational lower bound. Here, we develop a maximum likelihood training scheme for VAEs by introducing unbiased gradient estimators of the log-likelihood. We obtain the unbiased estimators by augmenting the latent space with a set of importance samples, similarly to the importance weighted auto-encoder (IWAE), and then constructing a Markov chain Monte Carlo (MCMC) coupling procedure on this augmented space. We provide the conditions under which the estimators can be computed in finite time and have finite variance. We demonstrate experimentally that VAEs fitted with unbiased estimators exhibit better predictive performance on three image datasets.

* 16 pages, 4 figures

Via

Access Paper or Ask Questions

Prescribed Generative Adversarial Networks

Oct 09, 2019

Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei, Michalis K. Titsias

Figure 1 for Prescribed Generative Adversarial Networks

Figure 2 for Prescribed Generative Adversarial Networks

Figure 3 for Prescribed Generative Adversarial Networks

Figure 4 for Prescribed Generative Adversarial Networks

Abstract:Generative adversarial networks (GANs) are a powerful approach to unsupervised learning. They have achieved state-of-the-art performance in the image domain. However, GANs are limited in two ways. They often learn distributions with low support---a phenomenon known as mode collapse---and they do not guarantee the existence of a probability density, which makes evaluating generalization using predictive log-likelihood impossible. In this paper, we develop the prescribed GAN (PresGAN) to address these shortcomings. PresGANs add noise to the output of a density network and optimize an entropy-regularized adversarial loss. The added noise renders tractable approximations of the predictive log-likelihood and stabilizes the training procedure. The entropy regularizer encourages PresGANs to capture all the modes of the data distribution. Fitting PresGANs involves computing the intractable gradients of the entropy regularization term; PresGANs sidestep this intractability using unbiased stochastic estimates. We evaluate PresGANs on several datasets and found they mitigate mode collapse and generate samples with high perceptual quality. We further found that PresGANs reduce the gap in performance in terms of predictive log-likelihood between traditional GANs and variational autoencoders (VAEs).

* Code for this paper can be found at https://github.com/adjidieng/PresGANs

Via

Access Paper or Ask Questions

The Dynamic Embedded Topic Model

Jul 12, 2019

Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei

Figure 1 for The Dynamic Embedded Topic Model

Figure 2 for The Dynamic Embedded Topic Model

Figure 3 for The Dynamic Embedded Topic Model

Figure 4 for The Dynamic Embedded Topic Model

Abstract:Topic modeling analyzes documents to learn meaningful patterns of words. Dynamic topic models capture how these patterns vary over time for a set of documents that were collected over a large time span. We develop the dynamic embedded topic model (D-ETM), a generative model of documents that combines dynamic latent Dirichlet allocation (D-LDA) and word embeddings. The D-ETM models each word with a categorical distribution whose parameter is given by the inner product between the word embedding and an embedding representation of its assigned topic at a particular time step. The word embeddings allow the D-ETM to generalize to rare words. The D-ETM learns smooth topic trajectories by defining a random walk prior over the embeddings of the topics. We fit the D-ETM using structured amortized variational inference. On a collection of United Nations debates, we find that the D-ETM learns interpretable topics and outperforms D-LDA in terms of both topic quality and predictive performance.

Via

Access Paper or Ask Questions

Topic Modeling in Embedding Spaces

Jul 08, 2019

Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei

Figure 1 for Topic Modeling in Embedding Spaces

Figure 2 for Topic Modeling in Embedding Spaces

Figure 3 for Topic Modeling in Embedding Spaces

Figure 4 for Topic Modeling in Embedding Spaces

Abstract:Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic Model (ETM), a generative model of documents that marries traditional topic models with word embeddings. In particular, it models each word with a categorical distribution whose natural parameter is the inner product between a word embedding and an embedding of its assigned topic. To fit the ETM, we develop an efficient amortized variational inference algorithm. The ETM discovers interpretable topics even with large vocabularies that include rare words and stop words. It outperforms existing document models, such as latent Dirichlet allocation (LDA), in terms of both topic quality and predictive performance.

* Code can be found at https://github.com/adjidieng/ETM

Via

Access Paper or Ask Questions

A Contrastive Divergence for Combining Variational Inference and MCMC

May 28, 2019

Francisco J. R. Ruiz, Michalis K. Titsias

Figure 1 for A Contrastive Divergence for Combining Variational Inference and MCMC

Figure 2 for A Contrastive Divergence for Combining Variational Inference and MCMC

Figure 3 for A Contrastive Divergence for Combining Variational Inference and MCMC

Figure 4 for A Contrastive Divergence for Combining Variational Inference and MCMC

Abstract:We develop a method to combine Markov chain Monte Carlo (MCMC) and variational inference (VI), leveraging the advantages of both inference approaches. Specifically, we improve the variational distribution by running a few MCMC steps. To make inference tractable, we introduce the variational contrastive divergence (VCD), a new divergence that replaces the standard Kullback-Leibler (KL) divergence used in VI. The VCD captures a notion of discrepancy between the initial variational distribution and its improved version (obtained after running the MCMC steps), and it converges asymptotically to the symmetrized KL divergence between the variational distribution and the posterior of interest. The VCD objective can be optimized efficiently with respect to the variational parameters via stochastic optimization. We show experimentally that optimizing the VCD leads to better predictive performance on two latent variable models: logistic matrix factorization and variational autoencoders (VAEs).

* International Conference on Machine Learning (ICML 2019). 12 pages, 3 figures

Via

Access Paper or Ask Questions