Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dan M. Kluger

Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling

Jan 30, 2025

Dan M. Kluger, Kerri Lu, Tijana Zrnic, Sherrie Wang, Stephen Bates

Abstract:Machine learning models are increasingly used to produce predictions that serve as input data in subsequent statistical analyses. For example, computer vision predictions of economic and environmental indicators based on satellite imagery are used in downstream regressions; similarly, language models are widely used to approximate human ratings and opinions in social science research. However, failure to properly account for errors in the machine learning predictions renders standard statistical procedures invalid. Prior work uses what we call the Predict-Then-Debias estimator to give valid confidence intervals when machine learning algorithms impute missing variables, assuming a small complete sample from the population of interest. We expand the scope by introducing bootstrap confidence intervals that apply when the complete data is a nonuniform (i.e., weighted, stratified, or clustered) sample and to settings where an arbitrary subset of features is imputed. Importantly, the method can be applied to many settings without requiring additional calculations. We prove that these confidence intervals are valid under no assumptions on the quality of the machine learning model and are no wider than the intervals obtained by methods that do not use machine learning predictions.

Via

Access Paper or Ask Questions

Two Shifts for Crop Mapping: Leveraging Aggregate Crop Statistics to Improve Satellite-based Maps in New Regions

Sep 02, 2021

Dan M. Kluger, Sherrie Wang, David B. Lobell

Figure 1 for Two Shifts for Crop Mapping: Leveraging Aggregate Crop Statistics to Improve Satellite-based Maps in New Regions

Figure 2 for Two Shifts for Crop Mapping: Leveraging Aggregate Crop Statistics to Improve Satellite-based Maps in New Regions

Figure 3 for Two Shifts for Crop Mapping: Leveraging Aggregate Crop Statistics to Improve Satellite-based Maps in New Regions

Figure 4 for Two Shifts for Crop Mapping: Leveraging Aggregate Crop Statistics to Improve Satellite-based Maps in New Regions

Abstract:Crop type mapping at the field level is critical for a variety of applications in agricultural monitoring, and satellite imagery is becoming an increasingly abundant and useful raw input from which to create crop type maps. Still, in many regions crop type mapping with satellite data remains constrained by a scarcity of field-level crop labels for training supervised classification models. When training data is not available in one region, classifiers trained in similar regions can be transferred, but shifts in the distribution of crop types as well as transformations of the features between regions lead to reduced classification accuracy. We present a methodology that uses aggregate-level crop statistics to correct the classifier by accounting for these two types of shifts. To adjust for shifts in the crop type composition we present a scheme for properly reweighting the posterior probabilities of each class that are output by the classifier. To adjust for shifts in features we propose a method to estimate and remove linear shifts in the mean feature vector. We demonstrate that this methodology leads to substantial improvements in overall classification accuracy when using Linear Discriminant Analysis (LDA) to map crop types in Occitanie, France and in Western Province, Kenya. When using LDA as our base classifier, we found that in France our methodology led to percent reductions in misclassifications ranging from 2.8% to 42.2% (mean = 21.9%) over eleven different training departments, and in Kenya the percent reductions in misclassification were 6.6%, 28.4%, and 42.7% for three training regions. While our methodology was statistically motivated by the LDA classifier, it can be applied to any type of classifier. As an example, we demonstrate its successful application to improve a Random Forest classifier.

* Remote Sensing of Environment (2021). Volume 262
* This is the revised version of the paper which was submitted to Remote Sensing of Environment on May 3, 2021, immediately prior to acceptance for publication

Via

Access Paper or Ask Questions