Accuracy and interpretability are two essential properties for a crime prediction model. Because of the adverse effects that the crimes can have on human life, economy and safety, we need a model that can predict future occurrence of crime as accurately as possible so that early steps can be taken to avoid the crime. On the other hand, an interpretable model reveals the reason behind a model's prediction, ensures its transparency and allows us to plan the crime prevention steps accordingly. The key challenge in developing the model is to capture the non-linear spatial dependency and temporal patterns of a specific crime category while keeping the underlying structure of the model interpretable. In this paper, we develop AIST, an Attention-based Interpretable Spatio Temporal Network for crime prediction. AIST models the dynamic spatio-temporal correlations for a crime category based on past crime occurrences, external features (e.g., traffic flow and point of interest (POI) information) and recurring trends of crime. Extensive experiments show the superiority of our model in terms of both accuracy and interpretability using real datasets.