Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiri Navratil

Distributional Preference Alignment of LLMs via Optimal Transport

Jun 09, 2024

Igor Melnyk, Youssef Mroueh, Brian Belgodere, Mattia Rigotti, Apoorva Nitsure, Mikhail Yurochkin, Kristjan Greenewald, Jiri Navratil, Jerret Ross

Figure 1 for Distributional Preference Alignment of LLMs via Optimal Transport

Figure 2 for Distributional Preference Alignment of LLMs via Optimal Transport

Figure 3 for Distributional Preference Alignment of LLMs via Optimal Transport

Figure 4 for Distributional Preference Alignment of LLMs via Optimal Transport

Abstract:Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically dominant in the first order on the distribution of negative samples. We introduce a convex relaxation of this first-order stochastic dominance and cast it as an optimal transport problem with a smooth and convex cost. Thanks to the one-dimensional nature of the resulting optimal transport problem and the convexity of the cost, it has a closed-form solution via sorting on empirical measures. We fine-tune LLMs with this AOT objective, which enables alignment by penalizing the violation of the stochastic dominance of the reward distribution of the positive samples on the reward distribution of the negative samples. We analyze the sample complexity of AOT by considering the dual of the OT problem and show that it converges at the parametric rate. Empirically, we show on a diverse set of alignment datasets and LLMs that AOT leads to state-of-the-art models in the 7B family of models when evaluated with Open LLM Benchmarks and AlpacaEval.

Via

Access Paper or Ask Questions

Risk Assessment and Statistical Significance in the Age of Foundation Models

Oct 11, 2023

Apoorva Nitsure, Youssef Mroueh, Mattia Rigotti, Kristjan Greenewald, Brian Belgodere, Mikhail Yurochkin, Jiri Navratil, Igor Melnyk, Jerret Ross

Figure 1 for Risk Assessment and Statistical Significance in the Age of Foundation Models

Figure 2 for Risk Assessment and Statistical Significance in the Age of Foundation Models

Figure 3 for Risk Assessment and Statistical Significance in the Age of Foundation Models

Figure 4 for Risk Assessment and Statistical Significance in the Age of Foundation Models

Abstract:We propose a distributional framework for assessing socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance to balance risk and utility when choosing between alternatives. Using this framework, we formally develop a risk-aware approach for foundation model selection given guardrails quantified by specified metrics. Inspired by portfolio optimization and selection theory in mathematical finance, we define a \emph{metrics portfolio} for each model as a means to aggregate a collection of metrics, and perform model selection based on the stochastic dominance of these portfolios. The statistical significance of our tests is backed theoretically by an asymptotic analysis via central limit theorems instantiated in practice via a bootstrap variance estimate. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content.

Via

Access Paper or Ask Questions

Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

Oct 04, 2023

Jiri Navratil, Benjamin Elder, Matthew Arnold, Soumya Ghosh, Prasanna Sattigeri

Figure 1 for Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

Figure 2 for Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

Figure 3 for Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

Figure 4 for Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

Abstract:Accurate quantification of model uncertainty has long been recognized as a fundamental requirement for trusted AI. In regression tasks, uncertainty is typically quantified using prediction intervals calibrated to an ad-hoc operating point, making evaluation and comparison across different studies relatively difficult. Our work leverages: (1) the concept of operating characteristics curves and (2) the notion of a gain over a null reference, to derive a novel operating point agnostic assessment methodology for prediction intervals. The paper defines the Uncertainty Characteristics Curve and demonstrates its utility in selected scenarios. We argue that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.

* Published at Workshop on Distribution-Free Uncertainty Quantification, International Conference on Machine Learning (ICML), July 2022. arXiv admin note: substantial text overlap with arXiv:2106.00858

Via

Access Paper or Ask Questions

Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

May 02, 2023

Brian Belgodere, Pierre Dognin, Adam Ivankay, Igor Melnyk, Youssef Mroueh, Aleksandra Mojsilovic, Jiri Navratil, Apoorva Nitsure, Inkit Padhi, Mattia Rigotti(+4 more)

Figure 1 for Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

Figure 2 for Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

Figure 3 for Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

Figure 4 for Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

Abstract:Data collected from the real world tends to be biased, unbalanced, and at risk of exposing sensitive and private information. This reality has given rise to the idea of creating synthetic datasets to alleviate risk, bias, harm, and privacy concerns inherent in the real data. This concept relies on Generative AI models to produce unbiased, privacy-preserving synthetic data while being true to the real data. In this new paradigm, how can we tell if this approach delivers on its promises? We present an auditing framework that offers a holistic assessment of synthetic datasets and AI models trained on them, centered around bias and discrimination prevention, fidelity to the real data, utility, robustness, and privacy preservation. We showcase our framework by auditing multiple generative models on diverse use cases, including education, healthcare, banking, human resources, and across different modalities, from tabular, to time-series, to natural language. Our use cases demonstrate the importance of a holistic assessment in order to ensure compliance with socio-technical safeguards that regulators and policymakers are increasingly enforcing. For this purpose, we introduce the trust index that ranks multiple synthetic datasets based on their prescribed safeguards and their desired trade-offs. Moreover, we devise a trust-index-driven model selection and cross-validation procedure via auditing in the training loop that we showcase on a class of transformer models that we dub TrustFormers, across different modalities. This trust-driven model selection allows for controllable trust trade-offs in the resulting synthetic data. We instrument our auditing framework with workflows that connect different stakeholders from model development to audit and certification via a synthetic data auditing report.

* 49 pages; submitted

Via

Access Paper or Ask Questions

Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and Communicating the Uncertainty of AI

Jun 04, 2021

Soumya Ghosh, Q. Vera Liao, Karthikeyan Natesan Ramamurthy, Jiri Navratil, Prasanna Sattigeri, Kush R. Varshney, Yunfeng Zhang

Figure 1 for Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and Communicating the Uncertainty of AI

Abstract:In this paper, we describe an open source Python toolkit named Uncertainty Quantification 360 (UQ360) for the uncertainty quantification of AI models. The goal of this toolkit is twofold: first, to provide a broad range of capabilities to streamline as well as foster the common practices of quantifying, evaluating, improving, and communicating uncertainty in the AI application development lifecycle; second, to encourage further exploration of UQ's connections to other pillars of trustworthy AI such as fairness and transparency through the dissemination of latest research and education materials. Beyond the Python package (\url{https://github.com/IBM/UQ360}), we have developed an interactive experience (\url{http://uq360.mybluemix.net}) and guidance materials as educational tools to aid researchers and developers in producing and communicating high-quality uncertainties in an effective manner.

* Added references

Via

Access Paper or Ask Questions

Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals

Jun 01, 2021

Jiri Navratil, Benjamin Elder, Matthew Arnold, Soumya Ghosh, Prasanna Sattigeri

Figure 1 for Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals

Figure 2 for Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals

Figure 3 for Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals

Figure 4 for Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals

Abstract:Accurate quantification of model uncertainty has long been recognized as a fundamental requirement for trusted AI. In regression tasks, uncertainty is typically quantified using prediction intervals calibrated to a specific operating point, making evaluation and comparison across different studies difficult. Our work leverages: (1) the concept of operating characteristics curves and (2) the notion of a gain over a simple reference, to derive a novel operating point agnostic assessment methodology for prediction intervals. The paper describes the corresponding algorithm, provides a theoretical analysis, and demonstrates its utility in multiple scenarios. We argue that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.

* 10 pages main paper, 9 pages appendix

Via

Access Paper or Ask Questions

Learning Prediction Intervals for Model Performance

Dec 15, 2020

Benjamin Elder, Matthew Arnold, Anupama Murthi, Jiri Navratil

Figure 1 for Learning Prediction Intervals for Model Performance

Figure 2 for Learning Prediction Intervals for Model Performance

Figure 3 for Learning Prediction Intervals for Model Performance

Figure 4 for Learning Prediction Intervals for Model Performance

Abstract:Understanding model performance on unlabeled data is a fundamental challenge of developing, deploying, and maintaining AI systems. Model performance is typically evaluated using test sets or periodic manual quality assessments, both of which require laborious manual data labeling. Automated performance prediction techniques aim to mitigate this burden, but potential inaccuracy and a lack of trust in their predictions has prevented their widespread adoption. We address this core problem of performance prediction uncertainty with a method to compute prediction intervals for model performance. Our methodology uses transfer learning to train an uncertainty model to estimate the uncertainty of model performance predictions. We evaluate our approach across a wide range of drift conditions and show substantial improvement over competitive baselines. We believe this result makes prediction intervals, and performance prediction in general, significantly more practical for real-world use.

* 7+6 pages, 5 figures, AAAI 2021

Via

Access Paper or Ask Questions

Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing

Jul 10, 2020

Begum Taskazan, Jiri Navratil, Matthew Arnold, Anupama Murthi, Ganesh Venkataraman, Benjamin Elder

Figure 1 for Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing

Figure 2 for Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing

Figure 3 for Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing

Figure 4 for Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing

Abstract:Building and maintaining high-quality test sets remains a laborious and expensive task. As a result, test sets in the real world are often not properly kept up to date and drift from the production traffic they are supposed to represent. The frequency and severity of this drift raises serious concerns over the value of manually labeled test sets in the QA process. This paper proposes a simple but effective technique that drastically reduces the effort needed to construct and maintain a high-quality test set (reducing labeling effort by 80-100% across a range of practical scenarios). This result encourages a fundamental rethinking of the testing process by both practitioners, who can use these techniques immediately to improve their testing, and researchers who can help address many of the open questions raised by this new approach.

* International Workshop on Challenges in Deploying and Monitoring Machine Learning Systems in Conjunction with ICML 2020

Via

Access Paper or Ask Questions

Uncertainty Prediction for Deep Sequential Regression Using Meta Models

Jul 02, 2020

Jiri Navratil, Matthew Arnold, Benjamin Elder

Figure 1 for Uncertainty Prediction for Deep Sequential Regression Using Meta Models

Figure 2 for Uncertainty Prediction for Deep Sequential Regression Using Meta Models

Figure 3 for Uncertainty Prediction for Deep Sequential Regression Using Meta Models

Figure 4 for Uncertainty Prediction for Deep Sequential Regression Using Meta Models

Abstract:Generating high quality uncertainty estimates for sequential regression, particularly deep recurrent networks, remains a challenging and open problem. Existing approaches often make restrictive assumptions (such as stationarity) yet still perform poorly in practice, particularly in presence of real world non-stationary signals and drift. This paper describes a flexible method that can generate symmetric and asymmetric uncertainty estimates, makes no assumptions about stationarity, and outperforms competitive baselines on both drift and non drift scenarios. This work helps make sequential regression more effective and practical for use in real-world applications, and is a powerful new addition to the modeling toolbox for sequential uncertainty quantification in general.

* 8 pages main paper + 11 pages appendix/references; 10 figures

Via

Access Paper or Ask Questions

Towards Automating the AI Operations Lifecycle

Mar 28, 2020

Matthew Arnold, Jeffrey Boston, Michael Desmond, Evelyn Duesterwald, Benjamin Elder, Anupama Murthi, Jiri Navratil, Darrell Reimer

Figure 1 for Towards Automating the AI Operations Lifecycle

Abstract:Today's AI deployments often require significant human involvement and skill in the operational stages of the model lifecycle, including pre-release testing, monitoring, problem diagnosis and model improvements. We present a set of enabling technologies that can be used to increase the level of automation in AI operations, thus lowering the human effort required. Since a common source of human involvement is the need to assess the performance of deployed models, we focus on technologies for performance prediction and KPI analysis and show how they can be used to improve automation in the key stages of a typical AI operations pipeline.

Via

Access Paper or Ask Questions