The domain adaption (DA) problem on symmetric positive definite (SPD) manifolds has raised interest in the machine learning community because of the growing potential for the SPD-matrix representations across many non-stationary applicable scenarios. This paper generalizes the joint distribution adaption (JDA) to align the source and target domains on SPD manifolds and proposes a deep network architecture, Deep Optimal Transport (DOT), using the generalized JDA and the existing deep network architectures on SPD manifolds. The specific architecture in DOT enables it to learn an approximate optimal transport (OT) solution to the DA problems on SPD manifolds. In the experiments, DOT exhibits a 2.32% and 2.92% increase on the average accuracy in two highly non-stationary cross-session scenarios in brain-computer interfaces (BCIs), respectively. The visualizational results of the source and target domains before and after the transformation also demonstrate the validity of DOT.