There is increasing interest in using observed individual-level data to formulate personalized policy. Examples of this include heterogeneous pricing, individualized credit offers, and targeted social programs. This paper provides a general model of how personalized policy creates incentives for individuals to modify their behavior to obtain a better treatment. For a given planner objective, we show that standard estimators based on repeated risk minimization produce a suboptimal policy. We propose a dynamic experiment that estimates the optimal treatment allocation function when agents are strategic and has regret that decays at a linear rate. A key insight is that random variation in how treatment assignment depends on observed characteristics is required, and that randomized treatment assignment alone is not sufficient to identify the optimal policy. We show this experimental method outperforms alternative methods that do not learn strategic effects in simulations and in a small MTurk experiment.