Abstract:The development of EEG decoding algorithms confronts challenges such as data sparsity, subject variability, and the need for precise annotations, all of which are vital for advancing brain-computer interfaces and enhancing the diagnosis of diseases. To address these issues, we propose a novel two-stage approach named Self-Supervised State Reconstruction-Primed Riemannian Dynamics (EEG-ReMinD) , which mitigates reliance on supervised learning and integrates inherent geometric features. This approach efficiently handles EEG data corruptions and reduces the dependency on labels. EEG-ReMinD utilizes self-supervised and geometric learning techniques, along with an attention mechanism, to analyze the temporal dynamics of EEG features within the framework of Riemannian geometry, referred to as Riemannian dynamics. Comparative analyses on both intact and corrupted datasets from two different neurodegenerative disorders underscore the enhanced performance of EEG-ReMinD.
Abstract:Multimodal sentiment analysis aims to learn representations from different modalities to identify human emotions. However, existing works often neglect the frame-level redundancy inherent in continuous time series, resulting in incomplete modality representations with noise. To address this issue, we propose temporal-invariant learning for the first time, which constrains the distributional variations over time steps to effectively capture long-term temporal dynamics, thus enhancing the quality of the representations and the robustness of the model. To fully exploit the rich semantic information in textual knowledge, we propose a semantic-guided fusion module. By evaluating the correlations between different modalities, this module facilitates cross-modal interactions gated by modality-invariant representations. Furthermore, we introduce a modality discriminator to disentangle modality-invariant and modality-specific subspaces. Experimental results on two public datasets demonstrate the superiority of our model. Our code is available at https://github.com/X-G-Y/SATI.
Abstract:Multimodal sentiment recognition aims to learn representations from different modalities to identify human emotions. However, previous works does not suppresses the frame-level redundancy inherent in continuous time series, resulting in incomplete modality representations with noise. To address this issue, we propose the Temporal-invariant learning, which minimizes the distributional differences between time steps to effectively capture smoother time series patterns, thereby enhancing the quality of the representations and robustness of the model. To fully exploit the rich semantic information in textual knowledge, we propose a Text-Driven Fusion Module (TDFM). To guide cross-modal interactions, TDFM evaluates the correlations between different modality through modality-invariant representations. Furthermore, we introduce a modality discriminator to disentangle modality-invariant and modality-specific subspaces. Experimental results on two public datasets demonstrate the superiority of our model.