Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chengpiao Huang

Uncertainty Quantification for LLM-Based Survey Simulations

Feb 25, 2025

Chengpiao Huang, Yuhang Wu, Kaizheng Wang

Abstract:We investigate the reliable use of simulated survey responses from large language models (LLMs) through the lens of uncertainty quantification. Our approach converts synthetic data into confidence sets for population parameters of human responses, addressing the distribution shift between the simulated and real populations. A key innovation lies in determining the optimal number of simulated responses: too many produce overly narrow confidence sets with poor coverage, while too few yield excessively loose estimates. To resolve this, our method adaptively selects the simulation sample size, ensuring valid average-case coverage guarantees. It is broadly applicable to any LLM, irrespective of its fidelity, and any procedure for constructing confidence sets. Additionally, the selected sample size quantifies the degree of misalignment between the LLM and the target human population. We illustrate our method on real datasets and LLMs.

* 30 pages, 6 figures, 10 tables

Via

Access Paper or Ask Questions

A Similarity Measure Between Functions with Applications to Statistical Learning and Optimization

Jan 14, 2025

Chengpiao Huang, Kaizheng Wang

Abstract:In this note, we present a novel measure of similarity between two functions. It quantifies how the sub-optimality gaps of two functions convert to each other, and unifies several existing notions of functional similarity. We show that it has convenient operation rules, and illustrate its use in empirical risk minimization and non-stationary online optimization.

* 9 pages

Via

Access Paper or Ask Questions

Distribution-Free Predictive Inference under Unknown Temporal Drift

Jun 10, 2024

Elise Han, Chengpiao Huang, Kaizheng Wang

Figure 1 for Distribution-Free Predictive Inference under Unknown Temporal Drift

Figure 2 for Distribution-Free Predictive Inference under Unknown Temporal Drift

Figure 3 for Distribution-Free Predictive Inference under Unknown Temporal Drift

Figure 4 for Distribution-Free Predictive Inference under Unknown Temporal Drift

Abstract:Distribution-free prediction sets play a pivotal role in uncertainty quantification for complex statistical models. Their validity hinges on reliable calibration data, which may not be readily available as real-world environments often undergo unknown changes over time. In this paper, we propose a strategy for choosing an adaptive window and use the data therein to construct prediction sets. The window is selected by optimizing an estimated bias-variance tradeoff. We provide sharp coverage guarantees for our method, showing its adaptivity to the underlying temporal drift. We also illustrate its efficacy through numerical experiments on synthetic and real data.

* 25 pages, 4 figures, 6 tables

Via

Access Paper or Ask Questions

Model Assessment and Selection under Temporal Distribution Shift

Feb 13, 2024

Elise Han, Chengpiao Huang, Kaizheng Wang

Abstract:We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidate models by estimating the difference of their generalization errors. We further integrate pairwise comparisons into a single-elimination tournament, achieving near-optimal model selection from a collection of candidates. Theoretical analyses and numerical experiments demonstrate the adaptivity of our proposed methods to the non-stationarity in data.

* 24 pages, 6 figures

Via

Access Paper or Ask Questions

A Stability Principle for Learning under Non-Stationarity

Oct 27, 2023

Chengpiao Huang, Kaizheng Wang

Abstract:We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while keeping the cumulative bias within an acceptable range relative to the stochastic error. Our theory showcases the adaptability of this approach to unknown non-stationarity. The regret bound is minimax optimal up to logarithmic factors when the population losses are strongly convex, or Lipschitz only. At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces.

* 47 pages, 1 figure

Via

Access Paper or Ask Questions