Modern wireless channels are increasingly dense and mobile making the channel highly non-stationary. The time-varying distribution and the existence of joint interference across multiple degrees of freedom (e.g., users, antennas, frequency and symbols) in such channels render conventional precoding sub-optimal in practice, and have led to historically poor characterization of their statistics. The core of our work is the derivation of a high-order generalization of Mercer's Theorem to decompose the non-stationary channel into constituent fading sub-channels (2-D eigenfunctions) that are jointly orthogonal across its degrees of freedom. Consequently, transmitting these eigenfunctions with optimally derived coefficients eventually mitigates any interference across these dimensions and forms the foundation of the proposed joint spatio-temporal precoding. The precoded symbols directly reconstruct the data symbols at the receiver upon demodulation, thereby significantly reducing its computational burden, by alleviating the need for any complementary decoding. These eigenfunctions are paramount to extracting the second-order channel statistics, and therefore completely characterize the underlying channel. Theory and simulations show that such precoding leads to ${>}10^4{\times}$ BER improvement (at 20dB) over existing methods for non-stationary channels.