Spatiotemporal prediction of event data is a challenging task with a long history of research. While recent work in spatiotemporal prediction has leveraged deep sequential models that substantially improve over classical approaches, these models are prone to overfitting when the observation is extremely sparse, as in the task of crime event prediction. To overcome these sparsity issues, we present Multi-axis Attentive Prediction for Sparse Event Data (MAPSED). We propose a purely attentional approach to extract both short-term dynamics and long-term semantics of event propagation through two observation angles. Unlike existing temporal prediction models that propagate latent information primarily along the temporal dimension, the MAPSED simultaneously operates over all axes (time, 2D space, event type) of the embedded data tensor. We additionally introduce a novel Frobenius norm-based contrastive learning objective to improve latent representational generalization.Empirically, we validate MAPSED on two publicly accessible urban crime datasets for spatiotemporal sparse event prediction, where MAPSED outperforms both classical and state-of-the-art deep learning models. The proposed contrastive learning objective significantly enhances the MAPSED's ability to capture the semantics and dynamics of the events, resulting in better generalization ability to combat sparse observations.