Abstract:Deep learning (DL) models have seen increased attention for time series forecasting, yet the application on cyber-physical systems (CPS) is hindered by the lacking robustness of these methods. Thus, this study evaluates the robustness and generalization performance of DL architectures on multivariate time series data from CPS. Our investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise, and assesses their impact on overall performance. Furthermore, we test the generalization and transfer learning capabilities of these models by exposing them to out-of-distribution (OOD) samples. These include deviations from standard system operations, while the core dynamics of the underlying physical system are preserved. Additionally, we test how well the models respond to several data augmentation techniques, including added noise and time warping. Our experimental framework utilizes a simulated three-tank system, proposed as a novel benchmark for evaluating the robustness and generalization performance of DL algorithms in CPS data contexts. The findings reveal that certain DL model architectures and training techniques exhibit superior effectiveness in handling OOD samples and various perturbations. These insights have significant implications for the development of DL models that deliver reliable and robust performance in real-world CPS applications.