Abstract:Monitoring the performance of classification models in production is critical yet challenging due to strict labeling budgets, one-shot batch acquisition of labels and extremely low error rates. We propose a general framework based on Stratified Importance Sampling (SIS) that directly addresses these constraints in model monitoring. While SIS has previously been applied in specialized domains, our theoretical analysis establishes its broad applicability to the monitoring of classification models. Under mild conditions, SIS yields unbiased estimators with strict finite-sample mean squared error (MSE) improvements over both importance sampling (IS) and stratified random sampling (SRS). The framework does not rely on optimally defined proposal distributions or strata: even with noisy proxies and sub-optimal stratification, SIS can improve estimator efficiency compared to IS or SRS individually, though extreme proposal mismatch may limit these gains. Experiments across binary and multiclass tasks demonstrate consistent efficiency improvements under fixed label budgets, underscoring SIS as a principled, label-efficient, and operationally lightweight methodology for post-deployment model monitoring.
Abstract:Heavy-tailed distributions naturally arise in many settings, from finance to telecommunications. While regret minimization under sub-Gaussian or bounded support rewards has been widely studied, learning on heavy-tailed distributions only gained popularity over the last decade. In the stochastic heavy-tailed bandit problem, an agent learns under the assumption that the distributions have finite moments of maximum order $1+\epsilon$ which are uniformly bounded by a constant $u$, for some $\epsilon \in (0,1]$. To the best of our knowledge, literature only provides algorithms requiring these two quantities as an input. In this paper, we study the stochastic adaptive heavy-tailed bandit, a variation of the standard setting where both $\epsilon$ and $u$ are unknown to the agent. We show that adaptivity comes at a cost, introducing two lower bounds on the regret of any adaptive algorithm, implying a higher regret w.r.t. the standard setting. Finally, we introduce a specific distributional assumption and provide Adaptive Robust UCB, a regret minimization strategy matching the known lower bound for the heavy-tailed MAB problem.