We present a novel attention-based sequential model for mutually dependent spatio-temporal discrete event data, which is a versatile framework for capturing the non-homogeneous influence of events. We go beyond the assumption that the influence of the historical event (causing an upper-ward or downward jump in the intensity function) will fade monotonically over time, which is a key assumption made by many widely-used point process models, including those based on Recurrent Neural Networks (RNNs). We borrow the idea from the attention model based on a probabilistic score function, which leads to a flexible representation of the intensity function and is highly interpretable. We demonstrate the superior performance of our approach compared to the state-of-the-art for both synthetic and real data.