Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shiva Prasad Kasiviswanathan

A Classical View on Benign Overfitting: The Role of Sample Size

May 16, 2025

Junhyung Park, Patrick Bloebaum, Shiva Prasad Kasiviswanathan

Abstract:Benign overfitting is a phenomenon in machine learning where a model perfectly fits (interpolates) the training data, including noisy examples, yet still generalizes well to unseen data. Understanding this phenomenon has attracted considerable attention in recent years. In this work, we introduce a conceptual shift, by focusing on almost benign overfitting, where models simultaneously achieve both arbitrarily small training and test errors. This behavior is characteristic of neural networks, which often achieve low (but non-zero) training error while still generalizing well. We hypothesize that this almost benign overfitting can emerge even in classical regimes, by analyzing how the interaction between sample size and model complexity enables larger models to achieve both good training fit but still approach Bayes-optimal generalization. We substantiate this hypothesis with theoretical evidence from two case studies: (i) kernel ridge regression, and (ii) least-squares regression using a two-layer fully connected ReLU neural network trained via gradient flow. In both cases, we overcome the strong assumptions often required in prior work on benign overfitting. Our results on neural networks also provide the first generalization result in this setting that does not rely on any assumptions about the underlying regression function or noise, beyond boundedness. Our analysis introduces a novel proof technique based on decomposing the excess risk into estimation and approximation errors, interpreting gradient flow as an implicit regularizer, that helps avoid uniform convergence traps. This analysis idea could be of independent interest.

* The results here subsume: arXiv:2410.06191

Via

Access Paper or Ask Questions

$β$-calibration of Language Model Confidence Scores for Generative QA

Oct 09, 2024

Putra Manggala, Atalanti Mastakouri, Elke Kirschbaum, Shiva Prasad Kasiviswanathan, Aaditya Ramdas

Figure 1 for $β$-calibration of Language Model Confidence Scores for Generative QA

Figure 2 for $β$-calibration of Language Model Confidence Scores for Generative QA

Figure 3 for $β$-calibration of Language Model Confidence Scores for Generative QA

Figure 4 for $β$-calibration of Language Model Confidence Scores for Generative QA

Abstract:To use generative question-and-answering (QA) systems for decision-making and in any critical application, these systems need to provide well-calibrated confidence scores that reflect the correctness of their answers. Existing calibration methods aim to ensure that the confidence score is on average indicative of the likelihood that the answer is correct. We argue, however, that this standard (average-case) notion of calibration is difficult to interpret for decision-making in generative QA. To address this, we generalize the standard notion of average calibration and introduce $\beta$-calibration, which ensures calibration holds across different question-and-answer groups. We then propose discretized posthoc calibration schemes for achieving $\beta$-calibration.

Via

Access Paper or Ask Questions

Benign Overfitting for Regression with Trained Two-Layer ReLU Networks

Oct 08, 2024

Junhyung Park, Patrick Bloebaum, Shiva Prasad Kasiviswanathan

Abstract:We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.

* 65 pages

Via

Access Paper or Ask Questions

Differentially Private Conditional Independence Testing

Jun 11, 2023

Iden Kalemaj, Shiva Prasad Kasiviswanathan, Aaditya Ramdas

Figure 1 for Differentially Private Conditional Independence Testing

Figure 2 for Differentially Private Conditional Independence Testing

Figure 3 for Differentially Private Conditional Independence Testing

Figure 4 for Differentially Private Conditional Independence Testing

Abstract:Conditional independence (CI) tests are widely used in statistical data analysis, e.g., they are the building block of many algorithms for causal graph discovery. The goal of a CI test is to accept or reject the null hypothesis that $X \perp \!\!\! \perp Y \mid Z$, where $X \in \mathbb{R}, Y \in \mathbb{R}, Z \in \mathbb{R}^d$. In this work, we investigate conditional independence testing under the constraint of differential privacy. We design two private CI testing procedures: one based on the generalized covariance measure of Shah and Peters (2020) and another based on the conditional randomization test of Cand\`es et al. (2016) (under the model-X assumption). We provide theoretical guarantees on the performance of our tests and validate them empirically. These are the first private CI tests that work for the general case when $Z$ is continuous.

Via

Access Paper or Ask Questions

Debiasing Conditional Stochastic Optimization

Apr 20, 2023

Lie He, Shiva Prasad Kasiviswanathan

Figure 1 for Debiasing Conditional Stochastic Optimization

Figure 2 for Debiasing Conditional Stochastic Optimization

Figure 3 for Debiasing Conditional Stochastic Optimization

Figure 4 for Debiasing Conditional Stochastic Optimization

Abstract:In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure and therefore requires a high sample complexity to reach convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than existing bounds. We also develop new algorithms for the finite-sum variant of CSO that also significantly improve upon existing results. Finally, we believe that our debiasing technique could be an interesting tool applicable to other stochastic optimization problems too.

Via

Access Paper or Ask Questions

Interventional and Counterfactual Inference with Diffusion Models

Feb 02, 2023

Patrick Chao, Patrick Blöbaum, Shiva Prasad Kasiviswanathan

Figure 1 for Interventional and Counterfactual Inference with Diffusion Models

Figure 2 for Interventional and Counterfactual Inference with Diffusion Models

Figure 3 for Interventional and Counterfactual Inference with Diffusion Models

Figure 4 for Interventional and Counterfactual Inference with Diffusion Models

Abstract:We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusion-based causal models (DCM) to learn causal mechanisms, that generate unique latent encodings to allow for direct sampling under interventions as well as abduction for counterfactuals. We utilize DCM to model structural equations, seeing that diffusion models serve as a natural candidate here since they encode each node to a latent representation, a proxy for the exogenous noise, and offer flexible and accurate modeling to provide reliable causal statements and estimates. Our empirical evaluations demonstrate significant improvements over existing state-of-the-art methods for answering causal queries. Our theoretical results provide a methodology for analyzing the counterfactual error for general encoder/decoder models which could be of independent interest.

Via

Access Paper or Ask Questions

Thompson Sampling with Diffusion Generative Prior

Jan 12, 2023

Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton, Patrick Blöbaum

Abstract:In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with the learned prior to deal with new tasks at test time. Our posterior sampling algorithm is designed to carefully balance between the learned prior and the noisy observations that come from the learner's interaction with the environment. To capture realistic bandit scenarios, we also propose a novel diffusion model training procedure that trains even from incomplete and/or noisy data, which could be of independent interest. Finally, our extensive experimental evaluations clearly demonstrate the potential of the proposed approach.

Via

Access Paper or Ask Questions

Sequential Kernelized Independence Testing

Dec 14, 2022

Aleksandr Podkopaev, Patrick Blöbaum, Shiva Prasad Kasiviswanathan, Aaditya Ramdas

Figure 1 for Sequential Kernelized Independence Testing

Figure 2 for Sequential Kernelized Independence Testing

Figure 3 for Sequential Kernelized Independence Testing

Figure 4 for Sequential Kernelized Independence Testing

Abstract:Independence testing is a fundamental and classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) allow stopping earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. It is well known that classical batch tests are not tailored for streaming data settings, since valid inference after data peeking requires correcting for multiple testing, but such corrections generally result in low power. In this paper, we design sequential kernelized independence tests (SKITs) that overcome such shortcomings based on the principle of testing by betting. We exemplify our broad framework using bets inspired by kernelized dependence measures such as the Hilbert-Schmidt independence criterion (HSIC) and the constrained-covariance criterion (COCO). Importantly, we also generalize the framework to non-i.i.d. time-varying settings, for which there exist no batch tests. We demonstrate the power of our approaches on both simulated and real data.

Via

Access Paper or Ask Questions

Uplifting Bandits

Jun 08, 2022

Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton

Abstract:We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. After each action, the agent observes the realizations of all the variables. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers, such as clicks. We propose UCB-style algorithms that estimate the uplifts of the actions over a baseline. We study multiple variants of the problem, including when the baseline and affected variables are unknown, and prove sublinear regret bounds for all of these. We also provide lower bounds that justify the necessity of our modeling assumptions. Experiments on synthetic and real-world datasets show the benefit of methods that estimate the uplifts over policies that do not use this structure.

Via

Access Paper or Ask Questions

On Codomain Separability and Label Inference from (Noisy) Loss Functions

Jul 07, 2021

Abhinav Aggarwal, Shiva Prasad Kasiviswanathan, Zekun Xu, Oluwaseyi Feyisetan, Nathanael Teissier

Figure 1 for On Codomain Separability and Label Inference from (Noisy) Loss Functions

Figure 2 for On Codomain Separability and Label Inference from (Noisy) Loss Functions

Figure 3 for On Codomain Separability and Label Inference from (Noisy) Loss Functions

Figure 4 for On Codomain Separability and Label Inference from (Noisy) Loss Functions

Abstract:Machine learning classifiers rely on loss functions for performance evaluation, often on a private (hidden) dataset. Label inference was recently introduced as the problem of reconstructing the ground truth labels of this private dataset from just the (possibly perturbed) loss function values evaluated at chosen prediction vectors, without any other access to the hidden dataset. Existing results have demonstrated this inference is possible on specific loss functions like the cross-entropy loss. In this paper, we introduce the notion of codomain separability to formally study the necessary and sufficient conditions under which label inference is possible from any (noisy) loss function values. Using this notion, we show that for many commonly used loss functions, including multiclass cross-entropy with common activation functions and some Bregman divergence-based losses, it is possible to design label inference attacks for arbitrary noise levels. We demonstrate that these attacks can also be carried out through actual neural network models, and argue, both formally and empirically, the role of finite precision arithmetic in this setting.

Via

Access Paper or Ask Questions