Abstract:In multivariate time series classification, although current sequence analysis models have excellent classification capabilities, they show significant shortcomings when dealing with long sequence multivariate data, such as prolonged training times and decreased accuracy. This paper focuses on optimizing model performance for long-sequence multivariate data by mitigating the impact of extended time series and multiple variables on the model. We propose a principal component analysis (PCA)-based temporal streaming compression and dimensionality reduction algorithm for time series data (temporal streaming batch PCA, TSBPCA), which continuously updates the compact representation of the entire sequence through streaming PCA time estimation with time block updates, enhancing the data representation capability of a range of sequence analysis models. We evaluated this method using various models on five real datasets, and the experimental results show that our method performs well in terms of classification accuracy and time efficiency. Notably, our method demonstrates a trend of increasing effectiveness as sequence length grows; on the two longest sequence datasets, accuracy improved by about 7.2%, and execution time decreased by 49.5%.