The problem of robust optimal transport (OT) aims at recovering the best transport plan with respect to the worst possible cost function. In this work, we study novel robust OT formulations where the cost function is parameterized by a symmetric positive semi-definite Mahalanobis metric. In particular, we study several different regularizations on the Mahalanobis metric -- element-wise $p$-norm, KL-divergence, or doubly-stochastic constraint -- and show that the resulting optimization formulations can be considerably simplified by exploiting the problem structure. For large-scale applications, we additionally propose a suitable low-dimensional decomposition of the Mahalanobis metric for the studied robust OT problems. Overall, we view the robust OT (min-max) optimization problems as non-linear OT (minimization) problems, which we solve using a Frank-Wolfe algorithm. We discuss the use of robust OT distance as a loss function in multi-class/multi-label classification problems. Empirical results on several real-world tag prediction and multi-class datasets show the benefit of our modeling approach.