Abstract:Randomized experiments are the preferred approach for evaluating the effects of interventions, but they are costly and often yield estimates with substantial uncertainty. On the other hand, in silico experiments leveraging foundation models offer a cost-effective alternative that can potentially attain higher statistical precision. However, the benefits of in silico experiments come with a significant risk: statistical inferences are not valid if the models fail to accurately predict experimental responses to interventions. In this paper, we propose a novel approach that integrates the predictions from multiple foundation models with experimental data while preserving valid statistical inference. Our estimator is consistent and asymptotically normal, with asymptotic variance no larger than the standard estimator based on experimental data alone. Importantly, these statistical properties hold even when model predictions are arbitrarily biased. Empirical results across several randomized experiments show that our estimator offers substantial precision gains, equivalent to a reduction of up to 20% in the sample size needed to match the same precision as the standard estimator based on experimental data alone.
Abstract:The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the Subgroup Discovery for Longitudinal Data (SDLD) algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus (HIV) who are at higher risk of weight gain when receiving dolutegravir-containing antiretroviral therapies (ARTs) versus when receiving non dolutegravir-containing ARTs.
Abstract:We propose Causal Interaction Trees for identifying subgroups of participants that have enhanced treatment effects using observational data. We extend the Classification and Regression Tree algorithm by using splitting criteria that focus on maximizing between-group treatment effect heterogeneity based on subgroup-specific treatment effect estimators to dictate decision-making in the algorithm. We derive properties of three subgroup-specific treatment effect estimators that account for the observational nature of the data -- inverse probability weighting, g-formula and doubly robust estimators. We study the performance of the proposed algorithms using simulations and implement the algorithms in an observational study that evaluates the effectiveness of right heart catheterization on critically ill patients.