This paper uses the weather forecasting as an application background to illustrate the technique of \textit{deep uncertainty learning} (DUL). Weather forecasting has great significance throughout human history and is traditionally approached through numerical weather prediction (NWP) in which the atmosphere is modelled as differential equations. However, due to the instability of these differential equations in the presence of uncertainties, weather forecasting through numerical simulations may not be reliable. This paper explores weather forecasting as a data mining problem. We build a deep prediction interval (DPI) model based on sequence-to-sequence (seq2seq) that predicts spatio-temporal patterns of meteorological variables in the future 37 hours, which incorporates the informative knowledge of NWP. A big contribution and surprising finding in the training process of DPI is that training by mean variance error (MVE) loss instead of mean square error loss can significantly improve the generalization of point estimation, which has never been reported in previous researches. We think this phenomenon can be regarded as a new kind of regularization which can not only be on a par with the famous Dropout but also provide more uncertainty information, and hence comes into win-win situation. Based on single DPI, we then build deep ensemble. We evaluate our method on dataset from 10 realistic weather stations in Beijing of China. Experimental results shown DPI has better generalization than traditional point estimation and deep ensemble can further improve the performance. The deep ensemble method also achieved top-2 online score ranking in the competition of AI Challenger 2018. It can dramatically decrease up to 56\% error compared with NWP.