Contextual bandits often provide simple and effective personalization in decision making problems, making them popular in many domains including digital health. However, when bandits are deployed in the context of a scientific study, the aim is not only to personalize for an individual, but also to determine, with sufficient statistical power, whether or not the system's intervention is effective. In this work, we develop a set of constraints and a general meta-algorithm that can be used to both guarantee power constraints and minimize regret. Our results demonstrate a number of existing algorithms can be easily modified to satisfy the constraint without significant decrease in average return. We also show that our modification is also robust to a variety of model mis-specifications.

Title:Power-Constrained Bandits

Paper and Code