Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Charles Marx

Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF

Mar 28, 2025

Syrine Belakaria, Joshua Kazdan, Charles Marx, Chris Cundy, Willie Neiswanger, Sanmi Koyejo, Barbara E. Engelhardt, Stefano Ermon

Abstract:Reinforcement learning from human feedback (RLHF) has become a cornerstone of the training and alignment pipeline for large language models (LLMs). Recent advances, such as direct preference optimization (DPO), have simplified the preference learning step. However, collecting preference data remains a challenging and costly process, often requiring expert annotation. This cost can be mitigated by carefully selecting the data points presented for annotation. In this work, we propose an active learning approach to efficiently select prompt and preference pairs using a risk assessment strategy based on the Sharpe Ratio. To address the challenge of unknown preferences prior to annotation, our method evaluates the gradients of all potential preference annotations to assess their impact on model updates. These gradient-based evaluations enable risk assessment of data points regardless of the annotation outcome. By leveraging the DPO loss derivations, we derive a closed-form expression for computing these Sharpe ratios on a per-tuple basis, ensuring our approach remains both tractable and computationally efficient. We also introduce two variants of our method, each making different assumptions about prior information. Experimental results demonstrate that our method outperforms the baseline by up to 5% in win rates against the chosen completion with limited human preference data across several language models and real-world datasets.

Via

Access Paper or Ask Questions

Classification with Conceptual Safeguards

Nov 07, 2024

Hailey Joren, Charles Marx, Berk Ustun

Figure 1 for Classification with Conceptual Safeguards

Figure 2 for Classification with Conceptual Safeguards

Figure 3 for Classification with Conceptual Safeguards

Figure 4 for Classification with Conceptual Safeguards

Abstract:We propose a new approach to promote safety in classification tasks with established concepts. Our approach -- called a conceptual safeguard -- acts as a verification layer for models that predict a target outcome by first predicting the presence of intermediate concepts. Given this architecture, a safeguard ensures that a model meets a minimal level of accuracy by abstaining from uncertain predictions. In contrast to a standard selective classifier, a safeguard provides an avenue to improve coverage by allowing a human to confirm the presence of uncertain concepts on instances on which it abstains. We develop methods to build safeguards that maximize coverage without compromising safety, namely techniques to propagate the uncertainty in concept predictions and to flag salient concepts for human review. We benchmark our approach on a collection of real-world and synthetic datasets, showing that it can improve performance and coverage in deep learning tasks.

* International Conference on Learning Representations (ICLR), 2024

Via

Access Paper or Ask Questions

Calibrated Probabilistic Forecasts for Arbitrary Sequences

Sep 27, 2024

Charles Marx, Volodymyr Kuleshov, Stefano Ermon

Figure 1 for Calibrated Probabilistic Forecasts for Arbitrary Sequences

Figure 2 for Calibrated Probabilistic Forecasts for Arbitrary Sequences

Abstract:Real-world data streams can change unpredictably due to distribution shifts, feedback loops and adversarial actors, which challenges the validity of forecasts. We present a forecasting framework ensuring valid uncertainty estimates regardless of how data evolves. Leveraging the concept of Blackwell approachability from game theory, we introduce a forecasting framework that guarantees calibrated uncertainties for outcomes in any compact space (e.g., classification or bounded regression). We extend this framework to recalibrate existing forecasters, guaranteeing accurate uncertainties without sacrificing predictive performance. We implement both general-purpose gradient-based algorithms and algorithms optimized for popular special cases of our framework. Empirically, our algorithms improve calibration and downstream decision-making for energy systems.

Via

Access Paper or Ask Questions

Calibration by Distribution Matching: Trainable Kernel Calibration Metrics

Oct 31, 2023

Charles Marx, Sofian Zalouk, Stefano Ermon

Figure 1 for Calibration by Distribution Matching: Trainable Kernel Calibration Metrics

Figure 2 for Calibration by Distribution Matching: Trainable Kernel Calibration Metrics

Figure 3 for Calibration by Distribution Matching: Trainable Kernel Calibration Metrics

Figure 4 for Calibration by Distribution Matching: Trainable Kernel Calibration Metrics

Abstract:Calibration ensures that probabilistic forecasts meaningfully capture uncertainty by requiring that predicted probabilities align with empirical frequencies. However, many existing calibration methods are specialized for post-hoc recalibration, which can worsen the sharpness of forecasts. Drawing on the insight that calibration can be viewed as a distribution matching task, we introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. Furthermore, we provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions. Our empirical evaluation demonstrates that employing these metrics as regularizers enhances calibration, sharpness, and decision-making across a range of regression and classification tasks, outperforming methods relying solely on post-hoc recalibration.

Via

Access Paper or Ask Questions

Modular Conformal Calibration

Jul 05, 2022

Charles Marx, Shengjia Zhao, Willie Neiswanger, Stefano Ermon

Figure 1 for Modular Conformal Calibration

Figure 2 for Modular Conformal Calibration

Figure 3 for Modular Conformal Calibration

Figure 4 for Modular Conformal Calibration

Abstract:Uncertainty estimates must be calibrated (i.e., accurate) and sharp (i.e., informative) in order to be useful. This has motivated a variety of methods for recalibration, which use held-out data to turn an uncalibrated model into a calibrated model. However, the applicability of existing methods is limited due to their assumption that the original model is also a probabilistic model. We introduce a versatile class of algorithms for recalibration in regression that we call Modular Conformal Calibration (MCC). This framework allows one to transform any regression model into a calibrated probabilistic model. The modular design of MCC allows us to make simple adjustments to existing algorithms that enable well-behaved distribution predictions. We also provide finite-sample calibration guarantees for MCC algorithms. Our framework recovers isotonic recalibration, conformal calibration, and conformal interval prediction, implying that our theoretical results apply to those methods as well. Finally, we conduct an empirical study of MCC on 17 regression datasets. Our results show that new algorithms designed in our framework achieve near-perfect calibration and improve sharpness relative to existing methods.

Via

Access Paper or Ask Questions