Abstract:Anomaly detection (AD) plays a crucial role in time series applications, primarily because time series data is employed across real-world scenarios. Detecting anomalies poses significant challenges since anomalies take diverse forms making them hard to pinpoint accurately. Previous research has explored different AD models, making specific assumptions with varying sensitivity toward particular anomaly types. To address this issue, we propose a novel model selection for unsupervised AD using a combination of time series forest (TSF) and reinforcement learning (RL) approaches that dynamically chooses an AD technique. Our approach allows for effective AD without explicitly depending on ground truth labels that are often scarce and expensive to obtain. Results from the real-time series dataset demonstrate that the proposed model selection approach outperforms all other AD models in terms of the F1 score metric. For the synthetic dataset, our proposed model surpasses all other AD models except for KNN, with an impressive F1 score of 0.989. The proposed model selection framework also exceeded the performance of GPT-4 when prompted to act as an anomaly detector on the synthetic dataset. Exploring different reward functions revealed that the original reward function in our proposed AD model selection approach yielded the best overall scores. We evaluated the performance of the six AD models on an additional three datasets, having global, local, and clustered anomalies respectively, showing that each AD model exhibited distinct performance depending on the type of anomalies. This emphasizes the significance of our proposed AD model selection framework, maintaining high performance across all datasets, and showcasing superior performance across different anomaly types.
Abstract:Building an accurate load forecasting model with minimal underpredictions is vital to prevent any undesired power outages due to underproduction of electricity. However, the power consumption patterns of the residential sector contain fluctuations and anomalies making them challenging to predict. In this paper, we propose multiple Long Short-Term Memory (LSTM) frameworks with different asymmetric loss functions to impose a higher penalty on underpredictions. We also apply a density-based spatial clustering of applications with noise (DBSCAN) anomaly detection approach, prior to the load forecasting task, to remove any present oultiers. Considering the effect of weather and social factors, seasonality splitting is performed on the three considered datasets from France, Germany, and Hungary containing hourly power consumption, weather, and calendar features. Root-mean-square error (RMSE) results show that removing the anomalies efficiently reduces the underestimation and overestimation errors in all the seasonal datasets. Additionally, asymmetric loss functions and seasonality splitting effectively minimize underestimations despite increasing the overestimation error to some degree. Reducing underpredictions of electricity consumption is essential to prevent power outages that can be damaging to the community.