Abstract:A fundamental task in science is to determine the underlying causal relations because it is the knowledge of this functional structure what leads to the correct interpretation of an effect given the apparent associations in the observed data. In this sense, Causal Discovery is a technique that tackles this challenge by analyzing the statistical properties of the constituent variables. In this work, we target the generalizability of the discovery method by following a reductionist approach that only involves two variables, i.e., the pairwise or bi-variate setting. We question the current (possibly misleading) baseline results on the basis that they were obtained through supervised learning, which is arguably contrary to this genuinely exploratory endeavor. In consequence, we approach this problem in an unsupervised way, using robust Mutual Information measures, and observing the impact of the different variable types, which is oftentimes ignored in the design of solutions. Thus, we provide a novel set of standard unbiased results that can serve as a reference to guide future discovery tasks in completely unknown environments.
Abstract:This paper describes the development of a causal diagnosis approach for troubleshooting an industrial environment on the basis of the technical language expressed in Return on Experience records. The proposed method leverages the vectorized linguistic knowledge contained in the distributed representation of a Large Language Model, and the causal associations entailed by the embedded failure modes and mechanisms of the industrial assets. The paper presents the elementary but essential concepts of the solution, which is conceived as a causality-aware retrieval augmented generation system, and illustrates them experimentally on a real-world Predictive Maintenance setting. Finally, it discusses avenues of improvement for the maturity of the utilized causal technology to meet the robustness challenges of increasingly complex scenarios in the industry.
Abstract:Ensuring the reliability of power electronic converters is a matter of great importance, and data-driven condition monitoring techniques are cementing themselves as an important tool for this purpose. However, translating methods that work well in controlled lab environments to field applications presents significant challenges, notably because of the limited diversity and accuracy of the lab training data. By enabling the use of field data, online machine learning can be a powerful tool to overcome this problem, but it introduces additional challenges in ensuring the stability and predictability of the training processes. This work presents an edge computing method that mitigates these shortcomings with minimal additional memory usage, by employing an autonomous algorithm that prioritizes the storage of training samples with larger prediction errors. The method is demonstrated on the use case of a self-commissioning condition monitoring system, in the form of a thermal anomaly detection scheme for a variable frequency motor drive, where the algorithm self-learned to distinguish normal and anomalous operation with minimal prior knowledge. The obtained results, based on experimental data, show a significant improvement in prediction accuracy and training speed, when compared to equivalent models trained online without the proposed data selection process.