Abstract:Constructing reliable prediction sets is an obstacle for applications of neural models: Distribution-free conditional coverage is theoretically impossible, and the exchangeability assumption underpinning the coverage guarantees of standard split-conformal approaches is violated on domain shifts. Given these challenges, we propose and analyze a data-driven procedure for obtaining empirically reliable approximate conditional coverage, calculating unique quantile thresholds for each label for each test point. We achieve this via the strong signals for prediction reliability from KNN-based model approximations over the training set and approximations over constrained samples from the held-out calibration set. We demonstrate the potential for substantial (and otherwise unknowable) under-coverage with split-conformal alternatives with marginal coverage guarantees when not taking these distances and constraints into account with protein secondary structure prediction, grammatical error detection, sentiment classification, and fact verification, covering supervised sequence labeling, zero-shot sequence labeling (i.e., feature detection), document classification (with sparsity/interpretability constraints), and retrieval-classification, including class-imbalanced and domain-shifted settings.