Abstract:In this paper, we address conditional testing problems through the conformal inference framework. We define the localized conformal p-values by inverting prediction intervals and prove their theoretical properties. These defined p-values are then applied to several conditional testing problems to illustrate their practicality. Firstly, we propose a conditional outlier detection procedure to test for outliers in the conditional distribution with finite-sample false discovery rate (FDR) control. We also introduce a novel conditional label screening problem with the goal of screening multivariate response variables and propose a screening procedure to control the family-wise error rate (FWER). Finally, we consider the two-sample conditional distribution test and define a weighted U-statistic through the aggregation of localized p-values. Numerical simulations and real-data examples validate the superior performance of our proposed strategies.
Abstract:We study the problem of post-selection predictive inference in an online fashion. To avoid devoting resources to unimportant units, a preliminary selection of the current individual before reporting its prediction interval is common and meaningful in online predictive tasks. Since the online selection causes a temporal multiplicity in the selected prediction intervals, it is important to control the real-time false coverage-statement rate (FCR) to measure the averaged miscoverage error. We develop a general framework named CAS (Calibration after Adaptive Selection) that can wrap around any prediction model and online selection rule to output post-selection prediction intervals. If the current individual is selected, we first perform an adaptive selection on historical data to construct a calibration set, then output a conformal prediction interval for the unobserved label. We provide tractable constructions for the calibration set for popular online selection rules. We proved that CAS can achieve an exact selection-conditional coverage guarantee in the finite-sample and distribution-free regimes. For the decision-driven selection rule, including most online multiple-testing procedures, CAS can exactly control the real-time FCR below the target level without any distributional assumptions. For the online selection with symmetric thresholds, we establish the error bound for the control gap of FCR under mild distributional assumptions. To account for the distribution shift in online data, we also embed CAS into some recent dynamic conformal prediction methods and examine the long-run FCR control. Numerical results on both synthetic and real data corroborate that CAS can effectively control FCR around the target level and yield more narrowed prediction intervals over existing baselines across various settings.