Abstract:The accurate prediction of drought probability in specific regions is crucial for informed decision-making in agricultural practices. It is important to make predictions one year in advance, particularly for long-term decisions. However, forecasting this probability presents challenges due to the complex interplay of various factors within the region of interest and neighboring areas. In this study, we propose an end-to-end solution to address this issue based on various spatiotemporal neural networks. The models considered focus on predicting the drought intensity based on the Palmer Drought Severity Index (PDSI) for subregions of interest, leveraging intrinsic factors and insights from climate models to enhance drought predictions. Comparative evaluations demonstrate the superior accuracy of Convolutional LSTM (ConvLSTM) and transformer models compared to baseline gradient boosting and logistic regression solutions. The two former models achieved impressive ROC AUC scores from 0.90 to 0.70 for forecast horizons from one to six months, outperforming baseline models. The transformer showed superiority for shorter horizons, while ConvLSTM did so for longer horizons. Thus, we recommend selecting the models accordingly for long-term drought forecasting. To ensure the broad applicability of the considered models, we conduct extensive validation across regions worldwide, considering different environmental conditions. We also run several ablation and sensitivity studies to challenge our findings and provide additional information on how to solve the problem.
Abstract:The similarity learning problem in the oil \& gas industry aims to construct a model that estimates similarity between interval measurements for logging data. Previous attempts are mostly based on empirical rules, so our goal is to automate this process and exclude expensive and time-consuming expert labelling. One of the approaches for similarity learning is self-supervised learning (SSL). In contrast to the supervised paradigm, this one requires little or no labels for the data. Thus, we can learn such models even if the data labelling is absent or scarce. Nowadays, most SSL approaches are contrastive and non-contrastive. However, due to possible wrong labelling of positive and negative samples, contrastive methods don't scale well with the number of objects. Non-contrastive methods don't rely on negative samples. Such approaches are actively used in the computer vision. We introduce non-contrastive SSL for time series data. In particular, we build on top of BYOL and Barlow Twins methods that avoid using negative pairs and focus only on matching positive pairs. The crucial part of these methods is an augmentation strategy. Different augmentations of time series exist, while their effect on the performance can be both positive and negative. Our augmentation strategies and adaption for BYOL and Barlow Twins together allow us to achieve a higher quality (ARI $= 0.49$) than other self-supervised methods (ARI $= 0.34$ only), proving usefulness of the proposed non-contrastive self-supervised approach for the interval similarity problem and time series representation learning in general.