Abstract:Industrial time-series, as a structural data responds to production process information, can be utilized to perform data-driven decision-making for effective monitoring of industrial production process. However, there are some challenges for time-series forecasting in industry, e.g., predicting few-shot caused by data shortage, and decision-confusing caused by unknown treatment policy. To cope with the problems, we propose a novel causal domain adaptation framework, Causal Domain Adaptation (CDA) forecaster to improve the performance on the interested domain with limited data (target). Firstly, we analyze the causality existing along with treatments, and thus ensure the shared causality over time. Subsequently, we propose an answer-based attention mechanism to achieve domain-invariant representation by the shared causality in both domains. Then, a novel domain-adaptation is built to model treatments and outcomes jointly training on source and target domain. The main insights are that our designed answer-based attention mechanism allows the target domain to leverage the existed causality in source time-series even with different treatments, and our forecaster can predict the counterfactual outcome of industrial time-series, meaning a guidance in production process. Compared with commonly baselines, our method on real-world and synthetic oilfield datasets demonstrates the effectiveness in across-domain prediction and the practicality in guiding production process
Abstract:Effective prediction of shale gas production is crucial for strategic reservoir development. However, in new shale gas blocks, two main challenges are encountered: (1) the occurrence of negative transfer due to insufficient data, and (2) the limited interpretability of deep learning (DL) models. To tackle these problems, we propose a novel transfer learning methodology that utilizes domain adaptation and physical constraints. This methodology effectively employs historical data from the source domain to reduce negative transfer from the data distribution perspective, while also using physical constraints to build a robust and reliable prediction model that integrates various types of data. The methodology starts by dividing the production data from the source domain into multiple subdomains, thereby enhancing data diversity. It then uses Maximum Mean Discrepancy (MMD) and global average distance measures to decide on the feasibility of transfer. Through domain adaptation, we integrate all transferable knowledge, resulting in a more comprehensive target model. Lastly, by incorporating drilling, completion, and geological data as physical constraints, we develop a hybrid model. This model, a combination of a multi-layer perceptron (MLP) and a Transformer (Transformer-MLP), is designed to maximize interpretability. Experimental validation in China's southwestern region confirms the method's effectiveness.
Abstract:To address the issues of stability and fidelity in interpretable learning, a novel interpretable methodology, ensemble interpretation, is presented in this paper which integrates multi-perspective explanation of various interpretation methods. On one hand, we define a unified paradigm to describe the common mechanism of different interpretation methods, and then integrate the multiple interpretation results to achieve more stable explanation. On the other hand, a supervised evaluation method based on prior knowledge is proposed to evaluate the explaining performance of an interpretation method. The experiment results show that the ensemble interpretation is more stable and more consistent with human experience and cognition. As an application, we use the ensemble interpretation for feature selection, and then the generalization performance of the corresponding learning model is significantly improved.
Abstract:Machine learning approaches are widely studied in the production prediction of CBM wells after hydraulic fracturing, but merely used in practice due to the low generalization ability and the lack of interpretability. A novel methodology is proposed in this article to discover the latent causality from observed data, which is aimed at finding an indirect way to interpret the machine learning results. Based on the theory of causal discovery, a causal graph is derived with explicit input, output, treatment and confounding variables. Then, SHAP is employed to analyze the influence of the factors on the production capability, which indirectly interprets the machine learning models. The proposed method can capture the underlying nonlinear relationship between the factors and the output, which remedies the limitation of the traditional machine learning routines based on the correlation analysis of factors. The experiment on the data of CBM shows that the detected relationship between the production and the geological/engineering factors by the presented method, is coincident with the actual physical mechanism. Meanwhile, compared with traditional methods, the interpretable machine learning models have better performance in forecasting production capability, averaging 20% improvement in accuracy.