Non-additive measures, also known as fuzzy measures, capacities, and monotonic games, are increasingly used in different fields. Applications have been built within computer science and artificial intelligence related to e.g. decision making, image processing, machine learning for both classification, and regression. Tools for measure identification have been built. In short, as non-additive measures are more general than additive ones (i.e., than probabilities), they have better modeling capabilities allowing to model situations and problems that cannot be modeled by the latter. See e.g. the application of non-additive measures and the Choquet integral to model both Ellsberg paradox and Allais paradox. Because of that, there is an increasing need to analyze non-additive measures. The need for distances and similarities to compare them is no exception. Some work has been done for defining $f$-divergence for them. In this work we tackle the problem of defining the optimal transport problem for non-additive measures. Distances for pairs of probability distributions based on the optimal transport are extremely used in practical applications, and they are being studied extensively for their mathematical properties. We consider that it is necessary to provide appropriate definitions with a similar flavour, and that generalize the standard ones, for non-additive measures. We provide definitions based on the M\"obius transform, but also based on the $(\max, +)$-transform that we consider that has some advantages. We will discuss in this paper the problems that arise to define the transport problem for non-additive measures, and discuss ways to solve them. In this paper we provide the definitions of the optimal transport problem, and prove some properties.