We investigate spatio-temporal prediction and introduce a novel prediction algorithm. Our approach is based on the point processes, which we use to model the event arrivals in both space and time. Although we specifically use the Hawkes process, other processes can be readily used as provided remarks in the paper. Moreover, we partition the given spatial region into subregions by an adaptive decision tree and model each subregion with individual and interacting point processes. With individual point processes for each subregion, we estimate the time and location of the events using the past event times and locations. Furthermore, thanks to the nonstationary and self-exciting point generation mechanism in the Hawkes process and the adaptive partitioning of the space, we model the data as nonstationary in both time and space. Finally, we provide a gradient based joint optimization algorithm for the adaptive tree parameter and the point process parameters. With the joint optimization, our algorithm can infer the source statistics and adaptive partitioning of the region. We also provide a training algorithm for the online setup, where we update the model parameters with newly arrived points. We provide experimental results on both simulated data and real-life data where we compare our approach with the standard approaches and demonstrate significant performance improvements thanks to the adaptive spatial partitioning mechanism and the joint optimization procedure.