Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alessandro Leite

TAU

Evolutionary Pre-Prompt Optimization for Mathematical Reasoning

Dec 05, 2024

Mathurin Videau, Alessandro Leite, Marc Schoenauer, Olivier Teytaud

Figure 1 for Evolutionary Pre-Prompt Optimization for Mathematical Reasoning

Figure 2 for Evolutionary Pre-Prompt Optimization for Mathematical Reasoning

Figure 3 for Evolutionary Pre-Prompt Optimization for Mathematical Reasoning

Figure 4 for Evolutionary Pre-Prompt Optimization for Mathematical Reasoning

Abstract:Recent advancements have highlighted that large language models (LLMs), when given a small set of task-specific examples, demonstrate remarkable proficiency, a capability that extends to complex reasoning tasks. In particular, the combination of few-shot learning with the chain-of-thought (CoT) approach has been pivotal in steering models towards more logically consistent conclusions. This paper explores the optimization of example selection for designing effective CoT pre-prompts and shows that the choice of the optimization algorithm, typically in favor of comparison-based methods such as evolutionary computation, significantly enhances efficacy and feasibility. Specifically, thanks to a limited exploitative and overfitted optimization, Evolutionary Pre-Prompt Optimization (EPPO) brings an improvement over the naive few-shot approach exceeding 10 absolute points in exact match scores on benchmark datasets such as GSM8k and MathQA. These gains are consistent across various contexts and are further amplified when integrated with self-consistency (SC)

Via

Access Paper or Ask Questions

Mixture of Experts in Image Classification: What's the Sweet Spot?

Nov 27, 2024

Mathurin Videau, Alessandro Leite, Marc Schoenauer, Olivier Teytaud

Figure 1 for Mixture of Experts in Image Classification: What's the Sweet Spot?

Figure 2 for Mixture of Experts in Image Classification: What's the Sweet Spot?

Figure 3 for Mixture of Experts in Image Classification: What's the Sweet Spot?

Figure 4 for Mixture of Experts in Image Classification: What's the Sweet Spot?

Abstract:Mixture-of-Experts (MoE) models have shown promising potential for parameter-efficient scaling across various domains. However, the implementation in computer vision remains limited, and often requires large-scale datasets comprising billions of samples. In this study, we investigate the integration of MoE within computer vision models and explore various MoE configurations on open datasets. When introducing MoE layers in image classification, the best results are obtained for models with a moderate number of activated parameters per sample. However, such improvements gradually vanish when the number of parameters per sample increases.

Via

Access Paper or Ask Questions

Evolutionary Retrofitting

Oct 15, 2024

Mathurin Videau, Mariia Zameshina, Alessandro Leite, Laurent Najman, Marc Schoenauer, Olivier Teytaud

Abstract:AfterLearnER (After Learning Evolutionary Retrofitting) consists in applying non-differentiable optimization, including evolutionary methods, to refine fully-trained machine learning models by optimizing a set of carefully chosen parameters or hyperparameters of the model, with respect to some actual, exact, and hence possibly non-differentiable error signal, performed on a subset of the standard validation set. The efficiency of AfterLearnER is demonstrated by tackling non-differentiable signals such as threshold-based criteria in depth sensing, the word error rate in speech re-synthesis, image quality in 3D generative adversarial networks (GANs), image generation via Latent Diffusion Models (LDM), the number of kills per life at Doom, computational accuracy or BLEU in code translation, and human appreciations in image synthesis. In some cases, this retrofitting is performed dynamically at inference time by taking into account user inputs. The advantages of AfterLearnER are its versatility (no gradient is needed), the possibility to use non-differentiable feedback including human evaluations, the limited overfitting, supported by a theoretical study and its anytime behavior. Last but not least, AfterLearnER requires only a minimal amount of feedback, i.e., a few dozens to a few hundreds of scalars, rather than the tens of thousands needed in most related published works. Compared to fine-tuning (typically using the same loss, and gradient-based optimization on a smaller but still big dataset at a fine grain), AfterLearnER uses a minimum amount of data on the real objective function without requiring differentiability.

Via

Access Paper or Ask Questions

Learning Structural Causal Models through Deep Generative Models: Methods, Guarantees, and Challenges

May 08, 2024

Audrey Poinsot, Alessandro Leite, Nicolas Chesneau, Michèle Sébag, Marc Schoenauer

Figure 1 for Learning Structural Causal Models through Deep Generative Models: Methods, Guarantees, and Challenges

Figure 2 for Learning Structural Causal Models through Deep Generative Models: Methods, Guarantees, and Challenges

Figure 3 for Learning Structural Causal Models through Deep Generative Models: Methods, Guarantees, and Challenges

Figure 4 for Learning Structural Causal Models through Deep Generative Models: Methods, Guarantees, and Challenges

Abstract:This paper provides a comprehensive review of deep structural causal models (DSCMs), particularly focusing on their ability to answer counterfactual queries using observational data within known causal structures. It delves into the characteristics of DSCMs by analyzing the hypotheses, guarantees, and applications inherent to the underlying deep learning components and structural causal models, fostering a finer understanding of their capabilities and limitations in addressing different counterfactual queries. Furthermore, it highlights the challenges and open questions in the field of deep structural causal modeling. It sets the stages for researchers to identify future work directions and for practitioners to get an overview in order to find out the most appropriate methods for their needs.

* Accepted to the 33rd International Joint Conference on Artificial Intelligence

Via

Access Paper or Ask Questions

Conformal Approach To Gaussian Process Surrogate Evaluation With Coverage Guarantees

Jan 15, 2024

Edgar Jaber, Vincent Blot, Nicolas Brunel, Vincent Chabridon, Emmanuel Remy, Bertrand Iooss, Didier Lucor, Mathilde Mougeot, Alessandro Leite

Abstract:Gaussian processes (GPs) are a Bayesian machine learning approach widely used to construct surrogate models for the uncertainty quantification of computer simulation codes in industrial applications. It provides both a mean predictor and an estimate of the posterior prediction variance, the latter being used to produce Bayesian credibility intervals. Interpreting these intervals relies on the Gaussianity of the simulation model as well as the well-specification of the priors which are not always appropriate. We propose to address this issue with the help of conformal prediction. In the present work, a method for building adaptive cross-conformal prediction intervals is proposed by weighting the non-conformity score with the posterior standard deviation of the GP. The resulting conformal prediction intervals exhibit a level of adaptivity akin to Bayesian credibility sets and display a significant correlation with the surrogate model local approximation error, while being free from the underlying model assumptions and having frequentist coverage guarantees. These estimators can thus be used for evaluating the quality of a GP surrogate model and can assist a decision-maker in the choice of the best prior for the specific application of the GP. The performance of the method is illustrated through a panel of numerical examples based on various reference databases. Moreover, the potential applicability of the method is demonstrated in the context of surrogate modeling of an expensive-to-evaluate simulator of the clogging phenomenon in steam generators of nuclear reactors.

Via

Access Paper or Ask Questions