Abstract:Urban spatio-temporal (ST) forecasting is crucial for various urban applications such as intelligent scheduling and trip planning. Previous studies focus on modeling ST correlations among urban locations in offline settings, which often neglect the non-stationary nature of urban ST data, particularly, distribution shifts over time. This oversight can lead to degraded performance in real-world scenarios. In this paper, we first analyze the distribution shifts in urban ST data, and then introduce DOST, a novel online continual learning framework tailored for ST data characteristics. DOST employs an adaptive ST network equipped with a variable-independent adapter to address the unique distribution shifts at each urban location dynamically. Further, to accommodate the gradual nature of these shifts, we also develop an awake-hibernate learning strategy that intermittently fine-tunes the adapter during the online phase to reduce computational overhead. This strategy integrates a streaming memory update mechanism designed for urban ST sequential data, enabling effective network adaptation to new patterns while preventing catastrophic forgetting. Experimental results confirm DOST's superiority over state-of-the-art models on four real-world datasets, providing online forecasts within an average of 0.1 seconds and achieving a 12.89% reduction in forecast errors compared to baseline models.