Safety assurance is uncompromisable for safety-critical environments with the presence of drastic model uncertainties (e.g., distributional shift), especially with humans in the loop. However, incorporating uncertainty in safe learning will naturally lead to a bi-level problem, where at the lower level the (worst-case) safety constraint is evaluated within the uncertainty ambiguity set. In this paper, we present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric. To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space where distributional shift is measured, to a finite-dimensional parametric space. Moreover, by differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules: a convex quadratic program to guarantee safety followed by a projected gradient ascent to simultaneously find the worst-case uncertainty. This end-to-end differentiable framework with safety constraints, to the best of our knowledge, is the first tractable single-level solution to address distributional safety. We test our approach on first and second-order systems with varying complexities and compare our results with the uncertainty-agnostic policies, where our approach demonstrates a significant improvement on safety guarantees.