Abstract:Model selection/optimization in conformal inference is challenging, since it may break the exchangeability between labeled and unlabeled data. We study this problem in the context of conformal selection, which uses conformal p-values to select ``interesting'' instances with large unobserved labels from a pool of unlabeled data, while controlling the FDR in finite sample. For validity, existing solutions require the model choice to be independent of the data used to construct the p-values and calibrate the selection set. However, when presented with many model choices and limited labeled data, it is desirable to (i) select the best model in a data-driven manner, and (ii) mitigate power loss due to sample splitting. This paper presents OptCS, a general framework that allows valid statistical testing (selection) after flexible data-driven model optimization. We introduce general conditions under which OptCS constructs valid conformal p-values despite substantial data reuse and handles complex p-value dependencies to maintain finite-sample FDR control via a novel multiple testing procedure. We instantiate this general recipe to propose three FDR-controlling procedures, each optimizing the models differently: (i) selecting the most powerful one among multiple pre-trained candidate models, (ii) using all data for model fitting without sample splitting, and (iii) combining full-sample model fitting and selection. We demonstrate the efficacy of our methods via simulation studies and real applications in drug discovery and alignment of large language models in radiology report generation.
Abstract:In mechanism design, the facility location game is an extensively studied problem. In the classical model, the cost of each agent is her distance to the nearest facility. In this paper, we consider a new model, where there is a location-dependent entrance fee to the facility. Thus, in our model, the cost of each agent is the sum of the distance to the facility and the entrance fee of the facility. This is a refined generalization of the classical model. We study the model and design strategyproof mechanisms. For one and two facilities, we provide upper and lower bounds for the approximation ratio given by deterministic and randomized mechanisms, with respect to the utilitarian objective and the egalitarian objective. Most of our bounds are tight and these bounds are independent of the entrance fee functions. Our results are as general as possible because the entrance fee function we consider is arbitrary.