Abstract:Reconstructing speech envelopes from EEG signals is essential for exploring neural mechanisms underlying speech perception. Yet, EEG variability across subjects and physiological artifacts complicate accurate reconstruction. To address this problem, we introduce Subject Disentangling Neural Network (SDN-Net), which disentangles subject identity information from reconstructed speech envelopes to enhance cross-subject reconstruction accuracy. SDN-Net integrates three key components: MLA-Codec, MPN-MI, and CTA-MTDNN. The MLA-Codec, a fully convolutional neural network, decodes EEG signals into speech envelopes. The CTA-MTDNN module, a multi-scale time-delay neural network with channel and temporal attention, extracts subject identity features from EEG signals. Lastly, the MPN-MI module, a mutual information estimator with a multi-layer perceptron, supervises the removal of subject identity information from the reconstructed speech envelope. Experiments on the Auditory EEG Decoding Dataset demonstrate that SDN-Net achieves superior performance in inner- and cross-subject speech envelope reconstruction compared to recent state-of-the-art methods.
Abstract:The scarcity of speaker-annotated far-field speech presents a significant challenge in developing high-performance far-field speaker verification (SV) systems. While data augmentation using large-scale near-field speech has been a common strategy to address this limitation, the mismatch in acoustic environments between near-field and far-field speech significantly hinders the improvement of far-field SV effectiveness. In this paper, we propose an adaptive speech augmentation approach leveraging NaturalSpeech3, a pre-trained foundation text-to-speech (TTS) model, to convert near-field speech into far-field speech by incorporating far-field acoustic ambient noise for data augmentation. Specifically, we utilize FACodec from NaturalSpeech3 to decompose the speech waveform into distinct embedding subspaces-content, prosody, speaker, and residual (acoustic details) embeddings-and reconstruct the speech waveform from these disentangled representations. In our method, the prosody, content, and residual embeddings of far-field speech are combined with speaker embeddings from near-field speech to generate augmented pseudo far-field speech that maintains the speaker identity from the out-domain near-field speech while preserving the acoustic environment of the in-domain far-field speech. This approach not only serves as an effective strategy for augmenting training data for far-field speaker verification but also extends to cross-data augmentation for enrollment and test speech in evaluation trials.Experimental results on FFSVC demonstrate that the adaptive data augmentation method significantly outperforms traditional approaches, such as random noise addition and reverberation, as well as other competitive data augmentation strategies.
Abstract:Contrast-enhanced brain MRI (CE-MRI) is a valuable diagnostic technique but may pose health risks and incur high costs. To create safer alternatives, multi-modality medical image translation aims to synthesize CE-MRI images from other available modalities. Although existing methods can generate promising predictions, they still face two challenges, i.e., exhibiting over-confidence and lacking interpretability on predictions. To address the above challenges, this paper introduces TrustI2I, a novel trustworthy method that reformulates multi-to-one medical image translation problem as a multimodal regression problem, aiming to build an uncertainty-aware and reliable system. Specifically, our method leverages deep evidential regression to estimate prediction uncertainties and employs an explicit intermediate and late fusion strategy based on the Mixture of Normal Inverse Gamma (MoNIG) distribution, enhancing both synthesis quality and interpretability. Additionally, we incorporate uncertainty calibration to improve the reliability of uncertainty. Validation on the BraTS2018 dataset demonstrates that our approach surpasses current methods, producing higher-quality images with rational uncertainty estimation.
Abstract:Electroencephalography (EEG) plays a vital role in detecting how brain responses to different stimulus. In this paper, we propose a novel Shallow-Deep Attention-based Network (SDANet) to classify the correct auditory stimulus evoking the EEG signal. It adopts the Attention-based Correlation Module (ACM) to discover the connection between auditory speech and EEG from global aspect, and the Shallow-Deep Similarity Classification Module (SDSCM) to decide the classification result via the embeddings learned from the shallow and deep layers. Moreover, various training strategies and data augmentation are used to boost the model robustness. Experiments are conducted on the dataset provided by Auditory EEG challenge (ICASSP Signal Processing Grand Challenge 2023). Results show that the proposed model has a significant gain over the baseline on the match-mismatch track.
Abstract:In this paper, the problem of orientation correction in cardiac MRI images is investigated and a framework for orientation recognition via deep neural networks is proposed. For multi-modality MRI, we introduce a transfer learning strategy to transfer our proposed model from single modality to multi-modality. We embed the proposed network into the orientation correction command-line tool, which can implement orientation correction on 2D DICOM and 3D NIFTI images. Our source code, network models and tools are available at https://github.com/Jy-stdio/MSCMR_orient/
Abstract:Recognizing the feelings of human beings plays a critical role in our daily communication. Neuroscience has demonstrated that different emotion states present different degrees of activation in different brain regions, EEG frequency bands and temporal stamps. In this paper, we propose a novel structure to explore the informative EEG features for emotion recognition. The proposed module, denoted by PST-Attention, consists of Positional, Spectral and Temporal Attention modules to explore more discriminative EEG features. Specifically, the Positional Attention module is to capture the activate regions stimulated by different emotions in the spatial dimension. The Spectral and Temporal Attention modules assign the weights of different frequency bands and temporal slices respectively. Our method is adaptive as well as efficient which can be fit into 3D Convolutional Neural Networks (3D-CNN) as a plug-in module. We conduct experiments on two real-world datasets. 3D-CNN combined with our module achieves promising results and demonstrate that the PST-Attention is able to capture stable patterns for emotion recognition from EEG.
Abstract:Electroencephalogram (EEG) can objectively reflect emotional state and changes. However, the transmission mechanism of EEG in the brain and its internal relationship with emotion are still ambiguous to human beings. This paper presents a novel approach to EEG emotion recognition built exclusively on self-attention over the spectrum, space, and time dimensions to explore the contribution of different EEG electrodes and temporal slices to specific emotional states. Our method, named EEG emotion Transformer (EeT), adapts the conventional Transformer architecture to EEG signals by enabling spatiospectral feature learning directly from the sequences of EEG signals. Our experimental results demonstrate that "joint attention" where temporal and spatial attention are applied simultaneously within each block, leads to the best emotion recognition accuracy among the design choices. In addition, compared with other competitive methods, the proposed method achieves state-of-art results on SEED and SEED-IV datasets.