L'EMbeDS and Institute of Economics, Sant'Anna School of Advanced Studies, Pisa
Abstract:Structural network embedding is a crucial step in enabling effective downstream tasks for complex systems that aims to project a network into a lower-dimensional space while preserving similarities among nodes. We introduce a simple and efficient embedding technique based on approximate variants of equitable partitions. The approximation consists in introducing a user-tunable tolerance parameter relaxing the otherwise strict condition for exact equitable partitions that can be hardly found in real-world networks. We exploit a relationship between equitable partitions and equivalence relations for Markov chains and ordinary differential equations to develop a partition refinement algorithm for computing an approximate equitable partition in polynomial time. We compare our method against state-of-the-art embedding techniques on benchmark networks. We report comparable -- when not superior -- performance for visualization, classification, and regression tasks at a cost between one and three orders of magnitude smaller using a prototype implementation, enabling the embedding of large-scale networks which could not be efficiently handled by most of the competing techniques.
Abstract:We present a novel, simple and widely applicable semi-supervised procedure for anomaly detection in industrial and IoT environments, SAnD (Simple Anomaly Detection). SAnD comprises 5 steps, each leveraging well-known statistical tools, namely; smoothing filters, variance inflation factors, the Mahalanobis distance, threshold selection algorithms and feature importance techniques. To our knowledge, SAnD is the first procedure that integrates these tools to identify anomalies and help decipher their putative causes. We show how each step contributes to tackling technical challenges that practitioners face when detecting anomalies in industrial contexts, where signals can be highly multicollinear, have unknown distributions, and intertwine short-lived noise with the long(er)-lived actual anomalies. The development of SAnD was motivated by a concrete case study from our industrial partner, which we use here to show its effectiveness. We also evaluate the performance of SAnD by comparing it with a selection of semi-supervised methods on public datasets from the literature on anomaly detection. We conclude that SAnD is effective, broadly applicable, and outperforms existing approaches in both anomaly detection and runtime.