Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:We don't need no labels: Estimating post-deployment model performance under covariate shift without ground truth

Jan 16, 2024

Jakub Białek, Wojtek Kuberski, Nikolaos Perrakis

Figure 1 for We don't need no labels: Estimating post-deployment model performance under covariate shift without ground truth

Figure 2 for We don't need no labels: Estimating post-deployment model performance under covariate shift without ground truth

Figure 3 for We don't need no labels: Estimating post-deployment model performance under covariate shift without ground truth

Figure 4 for We don't need no labels: Estimating post-deployment model performance under covariate shift without ground truth

Share this with someone who'll enjoy it:

Abstract:The performance of machine learning models often degrades after deployment due to data distribution shifts. In many use cases, it is impossible to calculate the post-deployment performance because labels are unavailable or significantly delayed. Proxy methods for evaluating model performance stability, like drift detection techniques, do not properly quantify data distribution shift impact. As a solution, we propose a robust and accurate performance estimation method for evaluating ML classification models on unlabeled data that accurately quantifies the impact of covariate shift on model performance. We call it multi-calibrated confidence-based performance estimation (M-CBPE). It is model and data-type agnostic and works for any performance metric. It does not require access to the monitored model - it uses the model predictions and probability estimates. M-CBPE does not need user input on the nature of the covariate shift as it fully learns from the data. We evaluate it with over 600 dataset-model pairs from US census data and compare it with multiple benchmarks using several evaluation metrics. Results show that M-CBPE is the best method to estimate the performance of classification models in any evaluation context.

View paper on

Share this with someone who'll enjoy it:

Title:We don't need no labels: Estimating post-deployment model performance under covariate shift without ground truth

Paper and Code