Abstract:Fuzzy Neural Networks (FNNs) are effective machine learning models for classification tasks, commonly based on the Takagi-Sugeno-Kang (TSK) fuzzy system. However, when faced with high-dimensional data, especially with noise, FNNs encounter challenges such as vanishing gradients, excessive fuzzy rules, and limited access to prior knowledge. To address these challenges, we propose a novel fuzzy system, the Self-Constructing Multi-Expert Fuzzy System (SOME-FS). It combines two learning strategies: mixed structure learning and multi-expert advanced learning. The former enables each base classifier to effectively determine its structure without requiring prior knowledge, while the latter tackles the issue of vanishing gradients by enabling each rule to focus on its local region, thereby enhancing the robustness of the fuzzy classifiers. The overall ensemble architecture enhances the stability and prediction performance of the fuzzy system. Our experimental results demonstrate that the proposed SOME-FS is effective in high-dimensional tabular data, especially in dealing with uncertainty. Moreover, our stable rule mining process can identify concise and core rules learned by the SOME-FS.
Abstract:The rapid evolution of Brain-Computer Interfaces (BCIs) has significantly influenced the domain of human-computer interaction, with Steady-State Visual Evoked Potentials (SSVEP) emerging as a notably robust paradigm. This study explores advanced classification techniques leveraging interpretable fuzzy transfer learning (iFuzzyTL) to enhance the adaptability and performance of SSVEP-based systems. Recent efforts have strengthened to reduce calibration requirements through innovative transfer learning approaches, which refine cross-subject generalizability and minimize calibration through strategic application of domain adaptation and few-shot learning strategies. Pioneering developments in deep learning also offer promising enhancements, facilitating robust domain adaptation and significantly improving system responsiveness and accuracy in SSVEP classification. However, these methods often require complex tuning and extensive data, limiting immediate applicability. iFuzzyTL introduces an adaptive framework that combines fuzzy logic principles with neural network architectures, focusing on efficient knowledge transfer and domain adaptation. iFuzzyTL refines input signal processing and classification in a human-interpretable format by integrating fuzzy inference systems and attention mechanisms. This approach bolsters the model's precision and aligns with real-world operational demands by effectively managing the inherent variability and uncertainty of EEG data. The model's efficacy is demonstrated across three datasets: 12JFPM (89.70% accuracy for 1s with an information transfer rate (ITR) of 149.58), Benchmark (85.81% accuracy for 1s with an ITR of 213.99), and eldBETA (76.50% accuracy for 1s with an ITR of 94.63), achieving state-of-the-art results and setting new benchmarks for SSVEP BCI performance.
Abstract:The paper introduces a Fuzzy-based Attention (Fuzzy Attention Layer) mechanism, a novel computational approach to enhance the interpretability and efficacy of neural models in psychological research. The proposed Fuzzy Attention Layer mechanism is integrated as a neural network layer within the Transformer Encoder model to facilitate the analysis of complex psychological phenomena through neural signals, such as those captured by functional Near-Infrared Spectroscopy (fNIRS). By leveraging fuzzy logic, the Fuzzy Attention Layer is capable of learning and identifying interpretable patterns of neural activity. This capability addresses a significant challenge when using Transformer: the lack of transparency in determining which specific brain activities most contribute to particular predictions. Our experimental results demonstrated on fNIRS data from subjects engaged in social interactions involving handholding reveal that the Fuzzy Attention Layer not only learns interpretable patterns of neural activity but also enhances model performance. Additionally, the learned patterns provide deeper insights into the neural correlates of interpersonal touch and emotional exchange. The application of our model shows promising potential in deciphering the subtle complexities of human social behaviors, thereby contributing significantly to the fields of social neuroscience and psychological AI.
Abstract:This paper presents a pioneering exploration into the integration of fine-grained human supervision within the autonomous driving domain to enhance system performance. The current advances in End-to-End autonomous driving normally are data-driven and rely on given expert trials. However, this reliance limits the systems' generalizability and their ability to earn human trust. Addressing this gap, our research introduces a novel approach by synchronously collecting data from human and machine drivers under identical driving scenarios, focusing on eye-tracking and brainwave data to guide machine perception and decision-making processes. This paper utilizes the Carla simulation to evaluate the impact brought by human behavior guidance. Experimental results show that using human attention to guide machine attention could bring a significant improvement in driving performance. However, guidance by human intention still remains a challenge. This paper pioneers a promising direction and potential for utilizing human behavior guidance to enhance autonomous systems.
Abstract:Driving under drowsy conditions significantly escalates the risk of vehicular accidents. Although recent efforts have focused on using electroencephalography to detect drowsiness, helping prevent accidents caused by driving in such states, seamless human-machine interaction in driving scenarios requires a more versatile EEG-based system. This system should be capable of understanding a driver's intention while demonstrating resilience to artifacts induced by sudden movements. This paper pioneers a novel research direction in BCI-assisted driving, studying the neural patterns related to driving intentions and presenting a novel method for driving intention prediction. In particular, our preliminary analysis of the EEG signal using independent component analysis suggests a close relation between the intention of driving maneuvers and the neural activities in central-frontal and parietal areas. Power spectral density analysis at a group level also reveals a notable distinction among various driving intentions in the frequency domain. To exploit these brain dynamics, we propose a novel Masked EEG Modeling framework for predicting human driving intentions, including the intention for left turning, right turning, and straight proceeding. Extensive experiments, encompassing comprehensive quantitative and qualitative assessments on public dataset, demonstrate the proposed method is proficient in predicting driving intentions across various vigilance states. Specifically, our model attains an accuracy of 85.19% when predicting driving intentions for drowsy subjects, which shows its promising potential for mitigating traffic accidents related to drowsy driving. Notably, our method maintains over 75% accuracy when more than half of the channels are missing or corrupted, underscoring its adaptability in real-life driving.
Abstract:Decoding linguistic information from non-invasive brain signals using EEG has gained increasing research attention due to its vast applicational potential. Recently, a number of works have adopted a generative-based framework to decode electroencephalogram (EEG) signals into sentences by utilizing the power generative capacity of pretrained large language models (LLMs). However, this approach has several drawbacks that hinder the further development of linguistic applications for brain-computer interfaces (BCIs). Specifically, the ability of the EEG encoder to learn semantic information from EEG data remains questionable, and the LLM decoder's tendency to generate sentences based on its training memory can be hard to avoid. These issues necessitate a novel approach for converting EEG signals into sentences. In this paper, we propose a novel two-step pipeline that addresses these limitations and enhances the validity of linguistic EEG decoding research. We first confirm that word-level semantic information can be learned from EEG data recorded during natural reading by training a Conformer encoder via a masked contrastive objective for word-level classification. To achieve sentence decoding results, we employ a training-free retrieval method to retrieve sentences based on the predictions from the EEG encoder. Extensive experiments and ablation studies were conducted in this paper for a comprehensive evaluation of the proposed approach. Visualization of the top prediction candidates reveals that our model effectively groups EEG segments into semantic categories with similar meanings, thereby validating its ability to learn patterns from unspoken EEG recordings. Despite the exploratory nature of this work, these results suggest that our method holds promise for providing more reliable solutions for converting EEG signals into text.
Abstract:Ensembling multiple models has always been an effective approach to push the limits of existing performance and is widely used in classification tasks by simply averaging the classification probability vectors from multiple classifiers to achieve better accuracy. However, in the thriving open-source Large Language Model (LLM) community, ensembling methods are rare and typically limited to ensembling the full-text outputs of LLMs, such as selecting the best output using a ranker, which leads to underutilization of token-level probability information. In this paper, we treat the Generation of each token by LLMs as a Classification (GaC) for ensembling. This approach fully exploits the probability information at each generation step and better prevents LLMs from producing early incorrect tokens that lead to snowballing errors. In experiments, we ensemble state-of-the-art LLMs on several benchmarks, including exams, mathematics and reasoning, and observe that our method breaks the existing community performance ceiling. Furthermore, we observed that most of the tokens in the answer are simple and do not affect the correctness of the final answer. Therefore, we also experimented with ensembling only key tokens, and the results showed better performance with lower latency across benchmarks.
Abstract:Multi-agent systems often require agents to collaborate with or compete against other agents with diverse goals, behaviors, or strategies. Agent modeling is essential when designing adaptive policies for intelligent machine agents in multiagent systems, as this is the means by which the ego agent understands other agents' behavior and extracts their meaningful policy representations. These representations can be used to enhance the ego agent's adaptive policy which is trained by reinforcement learning. However, existing agent modeling approaches typically assume the availability of local observations from other agents (modeled agents) during training or a long observation trajectory for policy adaption. To remove these constrictive assumptions and improve agent modeling performance, we devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution. With these observations, CLAM is capable of generating consistent high-quality policy representations in real-time right from the beginning of each episode. We evaluated the efficacy of our approach in both cooperative and competitive multi-agent environments. Our experiments demonstrate that our approach achieves state-of-the-art on both cooperative and competitive tasks, highlighting the potential of contrastive learning-based agent modeling for enhancing reinforcement learning.
Abstract:This paper presents BELT, a novel model and learning framework for the pivotal topic of brain-to-language translation research. The translation from noninvasive brain signals into readable natural language has the potential to promote the application scenario as well as the development of brain-computer interfaces (BCI) as a whole. The critical problem in brain signal decoding or brain-to-language translation is the acquisition of semantically appropriate and discriminative EEG representation from a dataset of limited scale and quality. The proposed BELT method is a generic and efficient framework that bootstraps EEG representation learning using off-the-shelf large-scale pretrained language models (LMs). With a large LM's capacity for understanding semantic information and zero-shot generalization, BELT utilizes large LMs trained on Internet-scale datasets to bring significant improvements to the understanding of EEG signals. In particular, the BELT model is composed of a deep conformer encoder and a vector quantization encoder. Semantical EEG representation is achieved by a contrastive learning step that provides natural language supervision. We achieve state-of-the-art results on two featuring brain decoding tasks including the brain-to-language translation and zero-shot sentiment classification. Specifically, our model surpasses the baseline model on both tasks by 5.45% and over 10% and archives a 42.31% BLEU-1 score and 67.32% precision on the main evaluation metrics for translation and zero-shot sentiment classification respectively.
Abstract:Making sense of multiple modalities can yield a more comprehensive description of real-world phenomena. However, learning the co-representation of diverse modalities is still a long-standing endeavor in emerging machine learning applications and research. Previous generative approaches for multimodal input approximate a joint-modality posterior by uni-modality posteriors as product-of-experts (PoE) or mixture-of-experts (MoE). We argue that these approximations lead to a defective bound for the optimization process and loss of semantic connection among modalities. This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space while handling the missing modality problem. By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization. In public datasets of various domains, the experimental results demonstrate that the proposed method is applicable to order-agnostic cross-modal generation while achieving outstanding performance compared to the state-of-the-art multimodal methods. The source code for our method is available online https://anonymous.4open.science/r/SMVAE-9B3C/.