Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bruno Charron

MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation

Feb 25, 2025

María Andrea Cruz Blandón, Jayasimha Talur, Bruno Charron, Dong Liu, Saab Mansour, Marcello Federico

Abstract:Automatic evaluation of retrieval augmented generation (RAG) systems relies on fine-grained dimensions like faithfulness and relevance, as judged by expert human annotators. Meta-evaluation benchmarks support the development of automatic evaluators that correlate well with human judgement. However, existing benchmarks predominantly focus on English or use translated data, which fails to capture cultural nuances. A native approach provides a better representation of the end user experience. In this work, we develop a Multilingual End-to-end Meta-Evaluation RAG benchmark (MEMERAG). Our benchmark builds on the popular MIRACL dataset, using native-language questions and generating responses with diverse large language models (LLMs), which are then assessed by expert annotators for faithfulness and relevance. We describe our annotation process and show that it achieves high inter-annotator agreement. We then analyse the performance of the answer-generating LLMs across languages as per the human evaluators. Finally we apply the dataset to our main use-case which is to benchmark multilingual automatic evaluators (LLM-as-a-judge). We show that our benchmark can reliably identify improvements offered by advanced prompting techniques and LLMs. We will release our benchmark to support the community developing accurate evaluation methods for multilingual RAG systems.

Via

Access Paper or Ask Questions

Large-scale Recommendation for Portfolio Optimization

Mar 13, 2021

Robin Swezey, Bruno Charron

Figure 1 for Large-scale Recommendation for Portfolio Optimization

Figure 2 for Large-scale Recommendation for Portfolio Optimization

Figure 3 for Large-scale Recommendation for Portfolio Optimization

Figure 4 for Large-scale Recommendation for Portfolio Optimization

Abstract:Individual investors are now massively using online brokers to trade stocks with convenient interfaces and low fees, albeit losing the advice and personalization traditionally provided by full-service brokers. We frame the problem faced by online brokers of replicating this level of service in a low-cost and automated manner for a very large number of users. Because of the care required in recommending financial products, we focus on a risk-management approach tailored to each user's portfolio and risk profile. We show that our hybrid approach, based on Modern Portfolio Theory and Collaborative Filtering, provides a sound and effective solution. The method is applicable to stocks as well as other financial assets, and can be easily combined with various financial forecasting models. We validate our proposal by comparing it with several baselines in a domain expert-based study.

* In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys 2018). Association for Computing Machinery, New York, NY, USA, 382-386

Via

Access Paper or Ask Questions

PiRank: Learning To Rank via Differentiable Sorting

Dec 12, 2020

Robin Swezey, Aditya Grover, Bruno Charron, Stefano Ermon

Figure 1 for PiRank: Learning To Rank via Differentiable Sorting

Figure 2 for PiRank: Learning To Rank via Differentiable Sorting

Figure 3 for PiRank: Learning To Rank via Differentiable Sorting

Figure 4 for PiRank: Learning To Rank via Differentiable Sorting

Abstract:A key challenge with machine learning approaches for ranking is the gap between the performance metrics of interest and the surrogate loss functions that can be optimized with gradient-based methods. This gap arises because ranking metrics typically involve a sorting operation which is not differentiable w.r.t. the model parameters. Prior works have proposed surrogates that are loosely related to ranking metrics or simple smoothed versions thereof. We propose PiRank, a new class of differentiable surrogates for ranking, which employ a continuous, temperature-controlled relaxation to the sorting operator. We show that PiRank exactly recovers the desired metrics in the limit of zero temperature and scales favorably with the problem size, both in theory and practice. Empirically, we demonstrate that PiRank significantly improves over existing approaches on publicly available internet-scale learning-to-rank benchmarks.

Via

Access Paper or Ask Questions