Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eleni Sgouritsa

Max Planck Institute for Intelligent Systems

Evaluating Gemini in an arena for learning

May 30, 2025

LearnLM Team, Abhinit Modi, Aditya Srikanth Veerubhotla, Aliya Rysbek, Andrea Huber, Ankit Anand, Avishkar Bhoopchand, Brett Wiltshire, Daniel Gillick, Daniel Kasenberg(+27 more)

Abstract:Artificial intelligence (AI) is poised to transform education, but the research community lacks a robust, general benchmark to evaluate AI models for learning. To assess state-of-the-art support for educational use cases, we ran an "arena for learning" where educators and pedagogy experts conduct blind, head-to-head, multi-turn comparisons of leading AI models. In particular, $N = 189$ educators drew from their experience to role-play realistic learning use cases, interacting with two models sequentially, after which $N = 206$ experts judged which model better supported the user's learning goals. The arena evaluated a slate of state-of-the-art models: Gemini 2.5 Pro, Claude 3.7 Sonnet, GPT-4o, and OpenAI o3. Excluding ties, experts preferred Gemini 2.5 Pro in 73.2% of these match-ups -- ranking it first overall in the arena. Gemini 2.5 Pro also demonstrated markedly higher performance across key principles of good pedagogy. Altogether, these results position Gemini 2.5 Pro as a leading model for learning.

Via

Access Paper or Ask Questions

Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation

Dec 18, 2024

Eleni Sgouritsa, Virginia Aglietti, Yee Whye Teh, Arnaud Doucet, Arthur Gretton, Silvia Chiappa

Figure 1 for Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation

Figure 2 for Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation

Figure 3 for Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation

Figure 4 for Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation

Abstract:The reasoning abilities of Large Language Models (LLMs) are attracting increasing attention. In this work, we focus on causal reasoning and address the task of establishing causal relationships based on correlation information, a highly challenging problem on which several LLMs have shown poor performance. We introduce a prompting strategy for this problem that breaks the original task into fixed subquestions, with each subquestion corresponding to one step of a formal causal discovery algorithm, the PC algorithm. The proposed prompting strategy, PC-SubQ, guides the LLM to follow these algorithmic steps, by sequentially prompting it with one subquestion at a time, augmenting the next subquestion's prompt with the answer to the previous one(s). We evaluate our approach on an existing causal benchmark, Corr2Cause: our experiments indicate a performance improvement across five LLMs when comparing PC-SubQ to baseline prompting strategies. Results are robust to causal query perturbations, when modifying the variable names or paraphrasing the expressions.

Via

Access Paper or Ask Questions

FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Jun 07, 2024

Virginia Aglietti, Ira Ktena, Jessica Schrouff, Eleni Sgouritsa, Francisco J. R. Ruiz, Alexis Bellot, Silvia Chiappa

Figure 1 for FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Figure 2 for FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Figure 3 for FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Figure 4 for FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Abstract:The sample efficiency of Bayesian optimization algorithms depends on carefully crafted acquisition functions (AFs) guiding the sequential collection of function evaluations. The best-performing AF can vary significantly across optimization problems, often requiring ad-hoc and problem-specific choices. This work tackles the challenge of designing novel AFs that perform well across a variety of experimental settings. Based on FunSearch, a recent work using Large Language Models (LLMs) for discovery in mathematical sciences, we propose FunBO, an LLM-based method that can be used to learn new AFs written in computer code by leveraging access to a limited number of evaluations for a set of objective functions. We provide the analytic expression of all discovered AFs and evaluate them on various global optimization benchmarks and hyperparameter optimization tasks. We show how FunBO identifies AFs that generalize well in and out of the training distribution of functions, thus outperforming established general-purpose AFs and achieving competitive performance against AFs that are customized to specific function types and are learned via transfer-learning algorithms.

Via

Access Paper or Ask Questions

Consistency of Causal Inference under the Additive Noise Model

Feb 05, 2014

Samory Kpotufe, Eleni Sgouritsa, Dominik Janzing, Bernhard Schölkopf

Figure 1 for Consistency of Causal Inference under the Additive Noise Model

Figure 2 for Consistency of Causal Inference under the Additive Noise Model

Abstract:We analyze a family of methods for statistical causal inference from sample under the so-called Additive Noise Model. While most work on the subject has concentrated on establishing the soundness of the Additive Noise Model, the statistical consistency of the resulting inference methods has received little attention. We derive general conditions under which the given family of inference methods consistently infers the causal direction in a nonparametric setting.

Via

Access Paper or Ask Questions

Identifying Finite Mixtures of Nonparametric Product Distributions and Causal Inference of Confounders

Sep 26, 2013

Eleni Sgouritsa, Dominik Janzing, Jonas Peters, Bernhard Schoelkopf

Figure 1 for Identifying Finite Mixtures of Nonparametric Product Distributions and Causal Inference of Confounders

Figure 2 for Identifying Finite Mixtures of Nonparametric Product Distributions and Causal Inference of Confounders

Figure 3 for Identifying Finite Mixtures of Nonparametric Product Distributions and Causal Inference of Confounders

Figure 4 for Identifying Finite Mixtures of Nonparametric Product Distributions and Causal Inference of Confounders

Abstract:We propose a kernel method to identify finite mixtures of nonparametric product distributions. It is based on a Hilbert space embedding of the joint distribution. The rank of the constructed tensor is equal to the number of mixture components. We present an algorithm to recover the components by partitioning the data points into clusters such that the variables are jointly conditionally independent given the cluster. This method can be used to identify finite confounders.

* Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

Via

Access Paper or Ask Questions

On Causal and Anticausal Learning

Jun 27, 2012

Bernhard Schoelkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, Joris Mooij

Figure 1 for On Causal and Anticausal Learning

Figure 2 for On Causal and Anticausal Learning

Figure 3 for On Causal and Anticausal Learning

Figure 4 for On Causal and Anticausal Learning

Abstract:We consider the problem of function estimation in the case where an underlying causal model can be inferred. This has implications for popular scenarios such as covariate shift, concept drift, transfer learning and semi-supervised learning. We argue that causal knowledge may facilitate some approaches for a given problem, and rule out others. In particular, we formulate a hypothesis for when semi-supervised learning can help, and corroborate it with empirical results.

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012). arXiv admin note: substantial text overlap with arXiv:1112.2738

Via

Access Paper or Ask Questions

Detecting low-complexity unobserved causes

Feb 14, 2012

Dominik Janzing, Eleni Sgouritsa, Oliver Stegle, Jonas Peters, Bernhard Schoelkopf

Figure 1 for Detecting low-complexity unobserved causes

Figure 2 for Detecting low-complexity unobserved causes

Figure 3 for Detecting low-complexity unobserved causes

Figure 4 for Detecting low-complexity unobserved causes

Abstract:We describe a method that infers whether statistical dependences between two observed variables X and Y are due to a "direct" causal link or only due to a connecting causal path that contains an unobserved variable of low complexity, e.g., a binary variable. This problem is motivated by statistical genetics. Given a genetic marker that is correlated with a phenotype of interest, we want to detect whether this marker is causal or it only correlates with a causal one. Our method is based on the analysis of the location of the conditional distributions P(Y|x) in the simplex of all distributions of Y. We report encouraging results on semi-empirical data.

Via

Access Paper or Ask Questions