Texture synthesis models are important to understand visual processing. In particular, statistical approaches based on neurally relevant features have been instrumental to understanding aspects of visual perception and of neural coding. New deep learning-based approaches further improve the quality of synthetic textures. Yet, it is still unclear why deep texture synthesis performs so well, and applications of this new framework to probe visual perception are scarce. Here, we show that distributions of deep convolutional neural network (CNN) activations of a texture are well described by elliptical distributions and therefore, following optimal transport theory, constraining their mean and covariance is sufficient to generate new texture samples. Then, we propose the natural geodesics (i.e. the shortest path between two points) arising with the optimal transport metric to interpolate between arbitrary textures. The comparison to alternative interpolation methods suggests that ours matches more closely the geometry of texture perception, and is better suited to study its statistical nature. We demonstrate our method by measuring the perceptual scale associated to the interpolation parameter in human observers, and the neural sensitivity of different areas of visual cortex in macaque monkeys.