Deep Learning (DL) provides a methodology to predict extreme loads observed in energy grids. Forecasting energy loads and prices is challenging due to sharp peaks and troughs that arise from intraday system constraints due to supply and demand fluctuations. We propose deep spatio-temporal models and extreme value theory (DL-EVT) to capture the tail behavior of load spikes. Deep architectures, such as ReLU and LSTM can model generation trends and temporal dependencies while EVT captures highly volatile load spikes. To illustrate our methodology, we use hourly price and demand data from the PJM interconnection for 4719 nodes and we develop a deep predictor. DL-EVT outperforms traditional Fourier and time series methods, both in-and out-of-sample, by capturing the nonlinearities in prices. Finally, we conclude with directions for future research.