Abstract:Covariates play an indispensable role in practical time series forecasting, offering rich context from the past and sometimes extending into the future. However, their availability varies depending on the scenario, and situations often involve multiple target variables simultaneously. Moreover, the cross-variate dependencies between them are multi-granular, with some covariates having a short-term impact on target variables and others showing long-term correlations. This heterogeneity and the intricate dependencies arising in covariate-informed forecasting present significant challenges to existing deep models. To address these issues, we propose CITRAS, a patch-based Transformer that flexibly leverages multiple targets and covariates covering both the past and the future forecasting horizon. While preserving the strong autoregressive capabilities of the canonical Transformer, CITRAS introduces two novel mechanisms in patch-wise cross-variate attention: Key-Value (KV) Shift and Attention Score Smoothing. KV Shift seamlessly incorporates future known covariates into the forecasting of target variables based on their concurrent dependencies. Additionally, Attention Score Smoothing transforms locally accurate patch-wise cross-variate dependencies into global variate-level dependencies by smoothing the past series of attention scores. Experimentally, CITRAS achieves state-of-the-art performance in both covariate-informed and multivariate forecasting, demonstrating its versatile ability to leverage cross-variate dependency for improved forecasting accuracy.