Abstract:Causal discovery can be a powerful tool for investigating causality when a system can be observed but is inaccessible to experiments in practice. Despite this, it is rarely used in any scientific or medical fields. One of the major hurdles preventing the field of causal discovery from having a larger impact is that it is difficult to determine when the output of a causal discovery method can be trusted in a real-world setting. Trust is especially critical when human health is on the line. In this paper, we report the results of a series of simulation studies investigating the performance of different resampling methods as indicators of confidence in discovered graph features. We found that subsampling and sampling with replacement both performed surprisingly well, suggesting that they can serve as grounds for confidence in graph features. We also found that the calibration of subsampling and sampling with replacement had different convergence properties, suggesting that one's choice of which to use should depend on the sample size.