Obtaining accurate information about future traffic flows of all links in a traffic network is of great importance for traffic management and control applications. This research studies two particular problems in traffic forecasting: (1) capture the dynamic and non-local spatial correlation between traffic links and (2) model the dynamics of temporal dependency for accurate multiple steps ahead predictions. To address these issues, we propose a deep learning framework named Spatial-Temporal Sequence to Sequence model (STSeq2Seq). This model builds on sequence to sequence (seq2seq) architecture to capture temporal feature and relies on graph convolution for aggregating spatial information. Moreover, STSeq2Seq defines and constructs pattern-aware adjacency matrices (PAMs) based on pair-wise similarity of the recent traffic patterns on traffic links and integrate it into graph convolution operation. It also deploys a novel seq2sesq architecture which couples a convolutional encoder and a recurrent decoder with attention mechanism for dynamic modeling of long-range dependence between different time steps. We conduct extensive experiments using two publicly-available large-scale traffic datasets and compare STSeq2Seq with other baseline models. The numerical results demonstrate that the proposed model achieves state-of-the-art forecasting performance in terms of various error measures. The ablation study verifies the effectiveness of PAMs in capturing dynamic non-local spatial correlation and the superiority of proposed seq2seq architecture in modeling non-stationary temporal dependency for multiple steps ahead prediction. Furthermore, qualitative analysis is conducted on PAMs as well as the attention weights for model interpretation.