Abstract:We study optimal transport-based distributionally robust optimization problems where a fictitious adversary, often envisioned as nature, can choose the distribution of the uncertain problem parameters by reshaping a prescribed reference distribution at a finite transportation cost. In this framework, we show that robustification is intimately related to various forms of variation and Lipschitz regularization even if the transportation cost function fails to be (some power of) a metric. We also derive conditions for the existence and the computability of a Nash equilibrium between the decision-maker and nature, and we demonstrate numerically that nature's Nash strategy can be viewed as a distribution that is supported on remarkably deceptive adversarial samples. Finally, we identify practically relevant classes of optimal transport-based distributionally robust optimization problems that can be addressed with efficient gradient descent algorithms even if the loss function or the transportation cost function are nonconvex (but not both at the same time).
Abstract:Wasserstein distributionally robust optimization has recently emerged as a powerful framework for robust estimation, enjoying good out-of-sample performance guarantees, well-understood regularization effects, and computationally tractable dual reformulations. In such framework, the estimator is obtained by minimizing the worst-case expected loss over all probability distributions which are close, in a Wasserstein sense, to the empirical distribution. In this paper, we propose a Wasserstein distributionally robust M-estimation framework to estimate an unknown parameter from noisy linear measurements, and we focus on the important and challenging task of analyzing the squared error performance of such estimators. Our study is carried out in the modern high-dimensional proportional regime, where both the ambient dimension and the number of samples go to infinity, at a proportional rate which encodes the under/over-parametrization of the problem. Under an isotropic Gaussian features assumption, we show that the squared error can be recover as the solution of a convex-concave optimization problem which, surprinsingly, involves at most four scalar variables. To the best of our knowledge, this is the first work to study this problem in the context of Wasserstein distributionally robust M-estimation.