https://github.com/JennyXieJiayi/STAR-HiT.
Next point-of-interest (POI) recommendation is a critical task in location-based social networks, yet remains challenging due to a high degree of variation and personalization exhibited in user movements. In this work, we explore the latent hierarchical structure composed of multi-granularity short-term structural patterns in user check-in sequences. We propose a Spatio-Temporal context AggRegated Hierarchical Transformer (STAR-HiT) for next POI recommendation, which employs stacked hierarchical encoders to recursively encode the spatio-temporal context and explicitly locate subsequences of different granularities. More specifically, in each encoder, the global attention layer captures the spatio-temporal context of the sequence, while the local attention layer performed within each subsequence enhances subsequence modeling using the local context. The sequence partition layer infers positions and lengths of subsequences from the global context adaptively, such that semantics in subsequences can be well preserved. Finally, the subsequence aggregation layer fuses representations within each subsequence to form the corresponding subsequence representation, thereby generating a new sequence of higher-level granularity. The stacking of encoders captures the latent hierarchical structure of the check-in sequence, which is used to predict the next visiting POI. Extensive experiments on three public datasets demonstrate that the proposed model achieves superior performance whilst providing explanations for recommendations. Codes are available at