Abstract:Analyzing electroencephalographic (EEG) time series can be challenging, especially with deep neural networks, due to the large variability among human subjects and often small datasets. To address these challenges, various strategies, such as self-supervised learning, have been suggested, but they typically rely on extensive empirical datasets. Inspired by recent advances in computer vision, we propose a pretraining task termed "frequency pretraining" to pretrain a neural network for sleep staging by predicting the frequency content of randomly generated synthetic time series. Our experiments demonstrate that our method surpasses fully supervised learning in scenarios with limited data and few subjects, and matches its performance in regimes with many subjects. Furthermore, our results underline the relevance of frequency information for sleep stage scoring, while also demonstrating that deep neural networks utilize information beyond frequencies to enhance sleep staging performance, which is consistent with previous research. We anticipate that our approach will be advantageous across a broad spectrum of applications where EEG data is limited or derived from a small number of subjects, including the domain of brain-computer interfaces.
Abstract:In recent years, data-driven modeling approaches have gained considerable traction in various meteorological applications, particularly in the realm of weather forecasting. However, these approaches often encounter challenges when dealing with extreme weather conditions. In light of this, we propose GA-SmaAt-GNet, a novel generative adversarial architecture that makes use of two methodologies aimed at enhancing the performance of deep learning models for extreme precipitation nowcasting. Firstly, it uses a novel SmaAt-GNet built upon the successful SmaAt-UNet architecture as generator. This network incorporates precipitation masks (binarized precipitation maps) as an additional data source, leveraging valuable information for improved predictions. Additionally, GA-SmaAt-GNet utilizes an attention-augmented discriminator inspired by the well-established Pix2Pix architecture. Furthermore, we assess the performance of GA-SmaAt-GNet using real-life precipitation dataset from the Netherlands. Our experimental results reveal a notable improvement in both overall performance and for extreme precipitation events. Furthermore, we conduct uncertainty analysis on the proposed GA-SmaAt-GNet model as well as on the precipitation dataset, providing additional insights into the predictive capabilities of the model. Finally, we offer further insights into the predictions of our proposed model using Grad-CAM. This visual explanation technique generates activation heatmaps, illustrating areas of the input that are more activated for various parts of the network.
Abstract:Accurate precipitation nowcasting is essential for various purposes, including flood prediction, disaster management, optimizing agricultural activities, managing transportation routes and renewable energy. While several studies have addressed this challenging task from a sequence-to-sequence perspective, most of them have focused on a single area without considering the existing correlation between multiple disjoint regions. In this paper, we formulate precipitation nowcasting as a spatiotemporal graph sequence nowcasting problem. In particular, we introduce Graph Dual-stream Convolutional Attention Fusion (GD-CAF), a novel approach designed to learn from historical spatiotemporal graph of precipitation maps and nowcast future time step ahead precipitation at different spatial locations. GD-CAF consists of spatio-temporal convolutional attention as well as gated fusion modules which are equipped with depthwise-separable convolutional operations. This enhancement enables the model to directly process the high-dimensional spatiotemporal graph of precipitation maps and exploits higher-order correlations between the data dimensions. We evaluate our model on seven years of precipitation maps across Europe and its neighboring areas collected from the ERA5 dataset, provided by Copernicus. The model receives a fully connected graph in which each node represents historical observations from a specific region on the map. Consequently, each node contains a 3D tensor with time, height, and width dimensions. Experimental results demonstrate that the proposed GD-CAF model outperforms the other examined models. Furthermore, the averaged seasonal spatial and temporal attention scores over the test set are visualized to provide additional insights about the strongest connections between different regions or time steps. These visualizations shed light on the decision-making process of our model.
Abstract:Self-supervised learning addresses the challenge encountered by many supervised methods, i.e. the requirement of large amounts of annotated data. This challenge is particularly pronounced in fields such as the electroencephalography (EEG) research domain. Self-supervised learning operates instead by utilizing pseudo-labels, which are generated by pretext tasks, to obtain a rich and meaningful data representation. In this study, we aim at introducing a dual-stream pretext task architecture that operates both in the time and frequency domains. In particular, we have examined the incorporation of the novel Frequency Similarity (FS) pretext task into two existing pretext tasks, Relative Positioning (RP) and Temporal Shuffling (TS). We assess the accuracy of these models using the Physionet Challenge 2018 (PC18) dataset in the context of the downstream task sleep stage classification. The inclusion of FS resulted in a notable improvement in downstream task accuracy, with a 1.28 percent improvement on RP and a 2.02 percent improvement on TS. Furthermore, when visualizing the learned embeddings using Uniform Manifold Approximation and Projection (UMAP), distinct clusters emerge, indicating that the learned representations carry meaningful information.
Abstract:This paper proposes an interpretable two-stream transformer CORAL networks (TransCORALNet) for supply chain credit assessment under the segment industry and cold start problem. The model aims to provide accurate credit assessment prediction for new supply chain borrowers with limited historical data. Here, the two-stream domain adaptation architecture with correlation alignment (CORAL) loss is used as a core model and is equipped with transformer, which provides insights about the learned features and allow efficient parallelization during training. Thanks to the domain adaptation capability of the proposed model, the domain shift between the source and target domain is minimized. Therefore, the model exhibits good generalization where the source and target do not follow the same distribution, and a limited amount of target labeled instances exist. Furthermore, we employ Local Interpretable Model-agnostic Explanations (LIME) to provide more insight into the model prediction and identify the key features contributing to supply chain credit assessment decisions. The proposed model addresses four significant supply chain credit assessment challenges: domain shift, cold start, imbalanced-class and interpretability. Experimental results on a real-world data set demonstrate the superiority of TransCORALNet over a number of state-of-the-art baselines in terms of accuracy. The code is available on GitHub https://github.com/JieJieNiu/TransCORALN .
Abstract:The accuracy and explainability of data-driven nowcasting models are of great importance in many socio-economic sectors reliant on weather-dependent decision making. This paper proposes a novel architecture called Small Attention Residual UNet (SAR-UNet) for precipitation and cloud cover nowcasting. Here, SmaAt-UNet is used as a core model and is further equipped with residual connections, parallel to the depthwise separable convolutions. The proposed SAR-UNet model is evaluated on two datasets, i.e., Dutch precipitation maps ranging from 2016 to 2019 and French cloud cover binary images from 2017 to 2018. The obtained results show that SAR-UNet outperforms other examined models in precipitation nowcasting from 30 to 180 minutes in the future as well as cloud cover nowcasting in the next 90 minutes. Furthermore, we provide additional insights on the nowcasts made by our proposed model using Grad-CAM, a visual explanation technique, which is employed on different levels of the encoder and decoder paths of the SAR-UNet model and produces heatmaps highlighting the critical regions in the input image as well as intermediate representations to the precipitation. The heatmaps generated by Grad-CAM reveal the interactions between the residual connections and the depthwise separable convolutions inside of the multiple depthwise separable blocks placed throughout the network architecture.
Abstract:Designing early warning systems for harsh weather and its effects, such as urban flooding or landslides, requires accurate short-term forecasts (nowcasts) of precipitation. Nowcasting is a significant task with several environmental applications, such as agricultural management or increasing flight safety. In this study, we investigate the use of a UNet core-model and its extension for precipitation nowcasting in western Europe for up to 3 hours ahead. In particular, we propose the Weather Fusion UNet (WF-UNet) model, which utilizes the Core 3D-UNet model and integrates precipitation and wind speed variables as input in the learning process and analyze its influences on the precipitation target task. We have collected six years of precipitation and wind radar images from Jan 2016 to Dec 2021 of 14 European countries, with 1-hour temporal resolution and 31 square km spatial resolution based on the ERA5 dataset, provided by Copernicus, the European Union's Earth observation programme. We compare the proposed WF-UNet model to persistence model as well as other UNet based architectures that are trained only using precipitation radar input data. The obtained results show that WF-UNet outperforms the other examined best-performing architectures by 22%, 8% and 6% lower MSE at a horizon of 1, 2 and 3 hours respectively.
Abstract:Deep learning (DL) applied to breast tissue segmentation in magnetic resonance imaging (MRI) has received increased attention in the last decade, however, the domain shift which arises from different vendors, acquisition protocols, and biological heterogeneity, remains an important but challenging obstacle on the path towards clinical implementation. Recently, unsupervised domain adaptation (UDA) methods have attempted to mitigate this problem by incorporating self-training with contrastive learning. To better exploit the underlying semantic information of the image at different levels, we propose a Multi-level Semantic-guided Contrastive Domain Adaptation (MSCDA) framework to align the feature representation between domains. In particular, we extend the contrastive loss by incorporating pixel-to-pixel, pixel-to-centroid, and centroid-to-centroid contrasts to integrate semantic information of images. We utilize a category-wise cross-domain sampling strategy to sample anchors from target images and build a hybrid memory bank to store samples from source images. Two breast MRI datasets were retrospectively collected: The source dataset contains non-contrast MRI examinations from 11 healthy volunteers and the target dataset contains contrast-enhanced MRI examinations of 134 invasive breast cancer patients. We set up experiments from source T2W image to target dynamic contrast-enhanced (DCE)-T1W image (T2W-to-T1W) and from source T1W image to target T2W image (T1W-to-T2W). The proposed method achieved Dice similarity coefficient (DSC) of 89.2\% and 84.0\% in T2W-to-T1W and T1W-to-T2W, respectively, outperforming state-of-the-art methods. Notably, good performance is still achieved with a smaller source dataset, proving that our framework is label-efficient.
Abstract:Accurate sound localization in a reverberation environment is essential for human auditory perception. Recently, Convolutional Neural Networks (CNNs) have been utilized to model the binaural human auditory pathway. However, CNN shows barriers in capturing the global acoustic features. To address this issue, we propose a novel end-to-end Binaural Audio Spectrogram Transformer (BAST) model to predict the sound azimuth in both anechoic and reverberation environments. Two modes of implementation, i.e. BAST-SP and BAST-NSP corresponding to BAST model with shared and non-shared parameters respectively, are explored. Our model with subtraction interaural integration and hybrid loss achieves an angular distance of 1.29 degrees and a Mean Square Error of 1e-3 at all azimuths, significantly surpassing CNN based model. The exploratory analysis of the BAST's performance on the left-right hemifields and anechoic and reverberation environments shows its generalization ability as well as the feasibility of binaural Transformers in sound localization. Furthermore, the analysis of the attention maps is provided to give additional insights on the interpretation of the localization process in a natural reverberant environment.
Abstract:This paper introduces a novel two-stream deep model based on graph convolutional network (GCN) architecture and feed-forward neural networks (FFNN) for learning the solution of nonlinear partial differential equations (PDEs). The model aims at incorporating both graph and grid input representations using two streams corresponding to GCN and FFNN models, respectively. Each stream layer receives and processes its own input representation. As opposed to FFNN which receives a grid-like structure, the GCN stream layer operates on graph input data where the neighborhood information is incorporated through the adjacency matrix of the graph. In this way, the proposed GCN-FFNN model learns from two types of input representations, i.e. grid and graph data, obtained via the discretization of the PDE domain. The GCN-FFNN model is trained in two phases. In the first phase, the model parameters of each stream are trained separately. Both streams employ the same error function to adjust their parameters by enforcing the models to satisfy the given PDE as well as its initial and boundary conditions on grid or graph collocation (training) data. In the second phase, the learned parameters of two-stream layers are frozen and their learned representation solutions are fed to fully connected layers whose parameters are learned using the previously used error function. The learned GCN-FFNN model is tested on test data located both inside and outside the PDE domain. The obtained numerical results demonstrate the applicability and efficiency of the proposed GCN-FFNN model over individual GCN and FFNN models on 1D-Burgers, 1D-Schr\"odinger, 2D-Burgers and 2D-Schr\"odinger equations.