Abstract:Time series data are collected in temporal order and are widely used to train systems for prediction, modeling and classification to name a few. These systems require large amounts of data to improve generalization and prevent over-fitting. However there is a comparative lack of time series data due to operational constraints. This situation is alleviated by synthesizing data which have a suitable spread of features yet retain the distinctive features of the original data. These would be its basic statistical properties and overall shape which are important for short time series such as in rehabilitative applications or in quickly changing portions of lengthy data. In our earlier work synthesized surrogate time series were used to augment rehabilitative data. This gave good results in classification but the resulting waveforms did not preserve the original signal shape. To remedy this, we use singular spectrum analysis (SSA) to separate a signal into trends and cycles to describe the shape of the signal and low level components. In a novel way we subject the low level component to randomizing processes then recombine this with the original trend and cycle components to form a synthetic time series. We compare our approach with other methods, using statistical and shape measures and demonstrate its effectiveness in classification.
Abstract:Time series (TS) data have consistently been in short supply, yet their demand remains high for training systems in prediction, modeling, classification, and various other applications. Synthesis can serve to expand the sample population, yet it is crucial to maintain the statistical characteristics between the synthesized and the original TS : this ensures consistent sampling of data for both training and testing purposes. However the time domain features of the data may not be maintained. This motivates for our work, the objective which is to preserve the following features in a synthesized TS: its fundamental statistical characteristics and important time domain features like its general shape and prominent transients. In a novel way, we first isolate important TS features into various components using a spectrogram and singular spectrum analysis. The residual signal is then randomized in a way that preserves its statistical properties. These components are then recombined for the synthetic time series. Using accelerometer data in a clinical setting, we use statistical and shape measures to compare our method to others. We show it has higher fidelity to the original signal features, has good diversity and performs better data classification in a deep learning application.