Abstract:Motor imagery (MI) based brain-computer interfaces (BCIs) enable the direct control of external devices through the imagined movements of various body parts. Unlike previous systems that used fixed-length EEG trials for MI decoding, asynchronous BCIs aim to detect the user's MI without explicit triggers. They are challenging to implement, because the algorithm needs to first distinguish between resting-states and MI trials, and then classify the MI trials into the correct task, all without any triggers. This paper proposes a sliding window prescreening and classification (SWPC) approach for MI-based asynchronous BCIs, which consists of two modules: a prescreening module to screen MI trials out of the resting-state, and a classification module for MI classification. Both modules are trained with supervised learning followed by self-supervised learning, which refines the feature extractors. Within-subject and cross-subject asynchronous MI classifications on four different EEG datasets validated the effectiveness of SWPC, i.e., it always achieved the highest average classification accuracy, and outperformed the best state-of-the-art baseline on each dataset by about 2%.
Abstract:Objective: An electroencephalogram (EEG)-based brain-computer interface (BCI) enables direct communication between the human brain and a computer. Due to individual differences and non-stationarity of EEG signals, such BCIs usually require a subject-specific calibration session before each use, which is time-consuming and user-unfriendly. Transfer learning (TL) has been proposed to shorten or eliminate this calibration, but existing TL approaches mainly consider offline settings, where all unlabeled EEG trials from the new user are available. Methods: This paper proposes Test-Time Information Maximization Ensemble (T-TIME) to accommodate the most challenging online TL scenario, where unlabeled EEG data from the new user arrive in a stream, and immediate classification is performed. T-TIME initializes multiple classifiers from the aligned source data. When an unlabeled test EEG trial arrives, T-TIME first predicts its labels using ensemble learning, and then updates each classifier by conditional entropy minimization and adaptive marginal distribution regularization. Our code is publicized. Results: Extensive experiments on three public motor imagery based BCI datasets demonstrated that T-TIME outperformed about 20 classical and state-of-the-art TL approaches. Significance: To our knowledge, this is the first work on test time adaptation for calibration-free EEG-based BCIs, making plug-and-play BCIs possible.
Abstract:A brain-computer interface (BCI) enables direct communication between the human brain and external devices. Electroencephalography (EEG) based BCIs are currently the most popular for able-bodied users. To increase user-friendliness, usually a small amount of user-specific EEG data are used for calibration, which may not be enough to develop a pure data-driven decoding model. To cope with this typical calibration data shortage challenge in EEG-based BCIs, this paper proposes a parameter-free channel reflection (CR) data augmentation approach that incorporates prior knowledge on the channel distributions of different BCI paradigms in data augmentation. Experiments on eight public EEG datasets across four different BCI paradigms (motor imagery, steady-state visual evoked potential, P300, and seizure classifications) using different decoding algorithms demonstrated that: 1) CR is effective, i.e., it can noticeably improve the classification accuracy; 2) CR is robust, i.e., it consistently outperforms existing data augmentation approaches in the literature; and, 3) CR is flexible, i.e., it can be combined with other data augmentation approaches to further increase the performance. We suggest that data augmentation approaches like CR should be an essential step in EEG-based BCIs. Our code is available online.
Abstract:Training an accurate classifier for EEG-based brain-computer interface (BCI) requires EEG data from a large number of users, whereas protecting their data privacy is a critical consideration. Federated learning (FL) is a promising solution to this challenge. This paper proposes Federated classification with local Batch-specific batch normalization and Sharpness-aware minimization (FedBS) for privacy protection in EEG-based motor imagery (MI) classification. FedBS utilizes local batch-specific batch normalization to reduce data discrepancies among different clients, and sharpness-aware minimization optimizer in local training to improve model generalization. Experiments on three public MI datasets using three popular deep learning models demonstrated that FedBS outperformed six state-of-the-art FL approaches. Remarkably, it also outperformed centralized training, which does not consider privacy protection at all. In summary, FedBS protects user EEG data privacy, enabling multiple BCI users to participate in large-scale machine learning model training, which in turn improves the BCI decoding accuracy.
Abstract:Spiking neural networks (SNNs) aim to simulate real neural networks in the human brain with biologically plausible neurons. The leaky integrate-and-fire (LIF) neuron is one of the most widely studied SNN architectures. However, it has the vanishing gradient problem when trained with backpropagation. Additionally, its neuronal parameters are often manually specified and fixed, in contrast to the heterogeneity of real neurons in the human brain. This paper proposes a gated parametric neuron (GPN) to process spatio-temporal information effectively with the gating mechanism. Compared with the LIF neuron, the GPN has two distinguishing advantages: 1) it copes well with the vanishing gradients by improving the flow of gradient propagation; and, 2) it learns spatio-temporal heterogeneous neuronal parameters automatically. Additionally, we use the same gate structure to eliminate initial neuronal parameter selection and design a hybrid recurrent neural network-SNN structure. Experiments on two spike-based audio datasets demonstrated that the GPN network outperformed several state-of-the-art SNNs, could mitigate vanishing gradients, and had spatio-temporal heterogeneous parameters. Our work shows the ability of SNNs to handle long-term dependencies and achieve high performance simultaneously.
Abstract:Forecasting faithful trajectories of multivariate time series from practical scopes is essential for reasonable decision-making. Recent methods majorly tailor generative conditional diffusion models to estimate the target temporal predictive distribution. However, it remains an obstacle to enhance the exploitation efficiency of given implicit temporal predictive information to bolster conditional diffusion learning. To this end, we propose a generic channel-aware Contrastive Conditional Diffusion model entitled CCDM to achieve desirable Multivariate probabilistic forecasting, obviating the need for curated temporal conditioning inductive biases. In detail, we first design a channel-centric conditional denoising network to manage intra-variate variations and cross-variate correlations, which can lead to scalability on diverse prediction horizons and channel numbers. Then, we devise an ad-hoc denoising-based temporal contrastive learning to explicitly amplify the predictive mutual information between past observations and future forecasts. It can coherently complement naive step-wise denoising diffusion training and improve the forecasting accuracy and generality on unknown test time series. Besides, we offer theoretic insights on the benefits of such auxiliary contrastive training refinement from both neural mutual information and temporal distribution generalization aspects. The proposed CCDM can exhibit superior forecasting capability compared to current state-of-the-art diffusion forecasters over a comprehensive benchmark, with best MSE and CRPS outcomes on $66.67\%$ and $83.33\%$ cases. Our code is publicly available at https://github.com/LSY-Cython/CCDM.
Abstract:This paper aims to address the challenge of sparse and missing data in recommendation systems, a significant hurdle in the age of big data. Traditional imputation methods struggle to capture complex relationships within the data. We propose a novel approach that fine-tune Large Language Model (LLM) and use it impute missing data for recommendation systems. LLM which is trained on vast amounts of text, is able to understand complex relationship among data and intelligently fill in missing information. This enriched data is then used by the recommendation system to generate more accurate and personalized suggestions, ultimately enhancing the user experience. We evaluate our LLM-based imputation method across various tasks within the recommendation system domain, including single classification, multi-classification, and regression compared to traditional data imputation methods. By demonstrating the superiority of LLM imputation over traditional methods, we establish its potential for improving recommendation system performance.
Abstract:This paper presents a novel approach to enhance image-to-image generation by leveraging the multimodal capabilities of the Large Language and Vision Assistant (LLaVA). We propose a framework where LLaVA analyzes input images and generates textual descriptions, hereinafter LLaVA-generated prompts. These prompts, along with the original image, are fed into the image-to-image generation pipeline. This enriched representation guides the generation process towards outputs that exhibit a stronger resemblance to the input image. Extensive experiments demonstrate the effectiveness of LLaVA-generated prompts in promoting image similarity. We observe a significant improvement in the visual coherence between the generated and input images compared to traditional methods. Future work will explore fine-tuning LLaVA prompts for increased control over the creative process. By providing more specific details within the prompts, we aim to achieve a delicate balance between faithfulness to the original image and artistic expression in the generated outputs.
Abstract:This paper presents a novel contribution to the field of regional style transfer. Existing methods often suffer from the drawback of applying style homogeneously across the entire image, leading to stylistic inconsistencies or foreground object twisted when applied to image with foreground elements such as person figures. To address this limitation, we propose a new approach that leverages a segmentation network to precisely isolate foreground objects within the input image. Subsequently, style transfer is applied exclusively to the background region. The isolated foreground objects are then carefully reintegrated into the style-transferred background. To enhance the visual coherence between foreground and background, a color transfer step is employed on the foreground elements prior to their rein-corporation. Finally, we utilize feathering techniques to achieve a seamless amalgamation of foreground and background, resulting in a visually unified and aesthetically pleasing final composition. Extensive evaluations demonstrate that our proposed approach yields significantly more natural stylistic transformations compared to conventional methods.
Abstract:Due to the vast electric vehicle (EV) penetration to distribution grid, charging load forecasting is essential to promote charging station operation and demand-side management.However, the stochastic charging behaviors and associated exogenous factors render future charging load patterns quite volatile and hard to predict. Accordingly, we devise a novel Diffusion model termed DiffPLF for Probabilistic Load Forecasting of EV charging, which can explicitly approximate the predictive load distribution conditioned on historical data and related covariates. Specifically, we leverage a denoising diffusion model, which can progressively convert the Gaussian prior to real time-series data by learning a reversal of the diffusion process. Besides, we couple such diffusion model with a cross-attention-based conditioning mechanism to execute conditional generation for possible charging demand profiles. We also propose a task-informed fine-tuning technique to better adapt DiffPLF to the probabilistic time-series forecasting task and acquire more accurate and reliable predicted intervals. Finally, we conduct multiple experiments to validate the superiority of DiffPLF to predict complex temporal patterns of erratic charging load and carry out controllable generation based on certain covariate. Results demonstrate that we can attain a notable rise of 39.58% and 49.87% on MAE and CRPS respectively compared to the conventional method.