Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Grégoire Mialon

WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models

Nov 27, 2023

Youssef Benchekroun, Megi Dervishi, Mark Ibrahim, Jean-Baptiste Gaya, Xavier Martinet, Grégoire Mialon, Thomas Scialom, Emmanuel Dupoux, Dieuwke Hupkes, Pascal Vincent

Abstract:We propose WorldSense, a benchmark designed to assess the extent to which LLMs are consistently able to sustain tacit world models, by testing how they draw simple inferences from descriptions of simple arrangements of entities. Worldsense is a synthetic benchmark with three problem types, each with their own trivial control, which explicitly avoids bias by decorrelating the abstract structure of problems from the vocabulary and expressions, and by decorrelating all problem subparts with the correct response. We run our benchmark on three state-of-the-art chat-LLMs (GPT3.5, GPT4 and Llama2-chat) and show that these models make errors even with as few as three objects. Furthermore, they have quite heavy response biases, preferring certain responses irrespective of the question. Errors persist even with chain-of-thought prompting and in-context learning. Lastly, we show that while finetuning on similar problems does result in substantial improvements -- within- and out-of-distribution -- the finetuned models do not generalise beyond a constraint problem space.

Via

Access Paper or Ask Questions

GAIA: a benchmark for General AI Assistants

Nov 21, 2023

Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom

Figure 1 for GAIA: a benchmark for General AI Assistants

Figure 2 for GAIA: a benchmark for General AI Assistants

Figure 3 for GAIA: a benchmark for General AI Assistants

Figure 4 for GAIA: a benchmark for General AI Assistants

Abstract:We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92\% vs. 15\% for GPT-4 equipped with plugins. This notable performance disparity contrasts with the recent trend of LLMs outperforming humans on tasks requiring professional skills in e.g. law or chemistry. GAIA's philosophy departs from the current trend in AI benchmarks suggesting to target tasks that are ever more difficult for humans. We posit that the advent of Artificial General Intelligence (AGI) hinges on a system's capability to exhibit similar robustness as the average human does on such questions. Using GAIA's methodology, we devise 466 questions and their answer. We release our questions while retaining answers to 300 of them to power a leader-board available at https://huggingface.co/gaia-benchmark.

Via

Access Paper or Ask Questions

Self-Supervised Learning with Lie Symmetries for Partial Differential Equations

Jul 11, 2023

Grégoire Mialon, Quentin Garrido, Hannah Lawrence, Danyal Rehman, Yann LeCun, Bobak T. Kiani

Abstract:Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations that are messy or incomplete. In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. Our representation outperforms baseline approaches to invariant tasks, such as regressing the coefficients of a PDE, while also improving the time-stepping performance of neural solvers. We hope that our proposed methodology will prove useful in the eventual development of general-purpose foundation models for PDEs.

Via

Access Paper or Ask Questions

On Inductive Biases for Machine Learning in Data Constrained Settings

Feb 21, 2023

Grégoire Mialon

Abstract:Learning with limited data is one of the biggest problems of machine learning. Current approaches to this issue consist in learning general representations from huge amounts of data before fine-tuning the model on a small dataset of interest. While such technique, coined transfer learning, is very effective in domains such as computer vision or natural langage processing, it does not yet solve common problems of deep learning such as model interpretability or the overall need for data. This thesis explores a different answer to the problem of learning expressive models in data constrained settings: instead of relying on big datasets to learn neural networks, we will replace some modules by known functions reflecting the structure of the data. Very often, these functions will be drawn from the rich literature of kernel methods. Indeed, many kernels can reflect the underlying structure of the data, thus sparing learning parameters to some extent. Our approach falls under the hood of "inductive biases", which can be defined as hypothesis on the data at hand restricting the space of models to explore during learning. We demonstrate the effectiveness of this approach in the context of sequences, such as sentences in natural language or protein sequences, and graphs, such as molecules. We also highlight the relationship between our work and recent advances in deep learning. Additionally, we study convex machine learning models. Here, rather than proposing new models, we wonder which proportion of the samples in a dataset is really needed to learn a "good" model. More precisely, we study the problem of safe sample screening, i.e, executing simple tests to discard uninformative samples from a dataset even before fitting a machine learning model, without affecting the optimal model. Such techniques can be used to prune datasets or mine for rare samples.

* PhD thesis defended on January 19th, 2022

Via

Access Paper or Ask Questions

Augmented Language Models: a Survey

Feb 15, 2023

Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz(+3 more)

Figure 1 for Augmented Language Models: a Survey

Figure 2 for Augmented Language Models: a Survey

Figure 3 for Augmented Language Models: a Survey

Figure 4 for Augmented Language Models: a Survey

Abstract:This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demonstrations. While adhering to a standard missing tokens prediction objective, such augmented LMs can use various, possibly non-parametric external modules to expand their context processing ability, thus departing from the pure language modeling paradigm. We therefore refer to them as Augmented Language Models (ALMs). The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks. In this work, after reviewing current advance in ALMs, we conclude that this new research direction has the potential to address common limitations of traditional LMs such as interpretability, consistency, and scalability issues.

Via

Access Paper or Ask Questions

Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

Sep 29, 2022

Grégoire Mialon, Randall Balestriero, Yann LeCun

Figure 1 for Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

Figure 2 for Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

Figure 3 for Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

Figure 4 for Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

Abstract:Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W-MSE avoid collapse of their joint embedding architectures by constraining or regularizing the covariance matrix of their projector's output. This study highlights important properties of such strategy, which we coin Variance-Covariance regularization (VCReg). More precisely, we show that VCReg enforces pairwise independence between the features of the learned representation. This result emerges by bridging VCReg applied on the projector's output to kernel independence criteria applied on the projector's input. This provides the first theoretical motivations and explanations of VCReg. We empirically validate our findings where (i) we observe that SSL methods employing VCReg learn visual representations with greater pairwise independence than other methods, (i) we put in evidence which projector's characteristics favor pairwise independence, and show it to emerge independently from learning the projector, (ii) we use these findings to obtain nontrivial performance gains for VICReg, (iii) we demonstrate that the scope of VCReg goes beyond SSL by using it to solve Independent Component Analysis. We hope that our findings will support the adoption of VCReg in SSL and beyond.

Via

Access Paper or Ask Questions

GraphiT: Encoding Graph Structure in Transformers

Jun 10, 2021

Grégoire Mialon, Dexiong Chen, Margot Selosse, Julien Mairal

Figure 1 for GraphiT: Encoding Graph Structure in Transformers

Figure 2 for GraphiT: Encoding Graph Structure in Transformers

Figure 3 for GraphiT: Encoding Graph Structure in Transformers

Figure 4 for GraphiT: Encoding Graph Structure in Transformers

Abstract:We show that viewing graphs as sets of node features and incorporating structural and positional information into a transformer architecture is able to outperform representations learned with classical graph neural networks (GNNs). Our model, GraphiT, encodes such information by (i) leveraging relative positional encoding strategies in self-attention scores based on positive definite kernels on graphs, and (ii) enumerating and encoding local sub-structures such as paths of short length. We thoroughly evaluate these two ideas on many classification and regression tasks, demonstrating the effectiveness of each of them independently, as well as their combination. In addition to performing well on standard benchmarks, our model also admits natural visualization mechanisms for interpreting graph motifs explaining the predictions, making it a potentially strong candidate for scientific applications where interpretation is important. Code available at https://github.com/inria-thoth/GraphiT.

Via

Access Paper or Ask Questions

An Optimal Transport Kernel for Feature Aggregation and its Relationship to Attention

Jun 23, 2020

Grégoire Mialon, Dexiong Chen, Alexandre d'Aspremont, Julien Mairal

Figure 1 for An Optimal Transport Kernel for Feature Aggregation and its Relationship to Attention

Figure 2 for An Optimal Transport Kernel for Feature Aggregation and its Relationship to Attention

Figure 3 for An Optimal Transport Kernel for Feature Aggregation and its Relationship to Attention

Figure 4 for An Optimal Transport Kernel for Feature Aggregation and its Relationship to Attention

Abstract:We introduce a kernel for sets of features based on an optimal transport distance, along with an explicit embedding function. Our approach addresses the problem of feature aggregation, or pooling, for sets that exhibit long-range dependencies between their members. More precisely, our embedding aggregates the features of a given set according to the transport plan between the set and a reference shared across the data set. Unlike traditional hand-crafted kernels, our embedding can be optimized for a specific task or data set. It also has a natural connection to attention mechanisms in neural networks, which are commonly used to deal with sets, yet requires less data. Our embedding is particularly suited for biological sequence classification tasks and shows promising results for natural language sequences. We provide an implementation of our embedding that can be used alone or as a module in larger learning models. Our code is freely available at https://github.com/claying/OTK.

Via

Access Paper or Ask Questions

Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Function

Dec 05, 2019

Grégoire Mialon, Alexandre d'Aspremont, Julien Mairal

Figure 1 for Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Function

Figure 2 for Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Function

Figure 3 for Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Function

Figure 4 for Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Function

Abstract:We design simple screening tests to automatically discard data samples in empirical risk minimization without losing optimization guarantees. We derive loss functions that produce dual objectives with a sparse solution. We also show how to regularize convex losses to ensure such a dual sparsity-inducing property, and propose a general method to design screening tests for classification or regression based on ellipsoidal approximations of the optimal set. In addition to producing computational gains, our approach also allows us to compress a dataset into a subset of representative points.

Via

Access Paper or Ask Questions

On Regularization and Robustness of Deep Neural Networks

Sep 30, 2018

Alberto Bietti, Grégoire Mialon, Julien Mairal

Figure 1 for On Regularization and Robustness of Deep Neural Networks

Figure 2 for On Regularization and Robustness of Deep Neural Networks

Figure 3 for On Regularization and Robustness of Deep Neural Networks

Figure 4 for On Regularization and Robustness of Deep Neural Networks

Abstract:Despite their success, deep neural networks suffer from several drawbacks: they lack robustness to small changes of input data known as "adversarial examples" and training them with small amounts of annotated data is challenging. In this work, we study the connection between regularization and robustness by viewing neural networks as elements of a reproducing kernel Hilbert space (RKHS) of functions and by regularizing them using the RKHS norm. Even though this norm cannot be computed, we consider various approximations based on upper and lower bounds. These approximations lead to new strategies for regularization, but also to existing ones such as spectral norm penalties or constraints, gradient penalties, or adversarial training. Besides, the kernel framework allows us to obtain margin-based bounds on adversarial generalization. We study the obtained algorithms for learning on small datasets, learning adversarially robust models, and discuss implications for learning implicit generative models.

Via

Access Paper or Ask Questions