In this paper, the performance of three deep learning methods for predicting short-term evolution and reproducing the long-term statistics of a multi-scale spatio-temporal Lorenz 96 system is examined. The methods are: echo state network (a type of reservoir computing, RC-ESN), deep feed-forward artificial neural network (ANN), and recurrent neural network with long short-term memory (RNN-LSTM). This Lorenz system has three tiers of nonlinearly interacting variables representing slow/large-scale ($X$), intermediate ($Y$), and fast/small-scale ($Z$) processes. For training or testing, only $X$ is available; $Y$ and $Z$ are never known/used. It is shown that RC-ESN substantially outperforms ANN and RNN-LSTM for short-term prediction, e.g., accurately forecasting the chaotic trajectories for hundreds of numerical solver's time steps, equivalent to several Lyapunov timescales. RNN-LSTM and ANN show some prediction skills as well; RNN-LSTM bests ANN. Furthermore, even after losing the trajectory, data predicted by RC-ESN and RNN-LSTM have probability density functions (PDFs) that closely match the true PDF, even at the tails. PDF of the ANN data deviates from the true PDF. Implications, caveats, and applications to data-driven and inexact, data-assisted surrogate modeling of complex dynamical systems such as weather/climate are discussed.