School of Physics, Beijing Institute of Technology, China, Beijing Academy of Quantum Information Sciences, China
Abstract:Mental health is a critical global public health issue, and psychological support hotlines play a pivotal role in providing mental health assistance and identifying suicide risks at an early stage. However, the emotional expressions conveyed during these calls remain underexplored in current research. This study introduces a method that combines pitch acoustic features with deep learning-based features to analyze and understand emotions expressed during hotline interactions. Using data from China's largest psychological support hotline, our method achieved an F1-score of 79.13% for negative binary emotion classification.Additionally, the proposed approach was validated on an open dataset for multi-class emotion classification,where it demonstrated better performance compared to the state-of-the-art methods. To explore its clinical relevance, we applied the model to analysis the frequency of negative emotions and the rate of emotional change in the conversation, comparing 46 subjects with suicidal behavior to those without. While the suicidal group exhibited more frequent emotional changes than the non-suicidal group, the difference was not statistically significant.Importantly, our findings suggest that emotional fluctuation intensity and frequency could serve as novel features for psychological assessment scales and suicide risk prediction.The proposed method provides valuable insights into emotional dynamics and has the potential to advance early intervention and improve suicide prevention strategies through integration with clinical tools and assessments The source code is publicly available at https://github.com/Sco-field/Speechemotionrecognition/tree/main.
Abstract:We consider the problem of contextual kernel bandits with stochastic contexts, where the underlying reward function belongs to a known Reproducing Kernel Hilbert Space (RKHS). We study this problem under the additional constraint of joint differential privacy, where the agents needs to ensure that the sequence of query points is differentially private with respect to both the sequence of contexts and rewards. We propose a novel algorithm that improves upon the state of the art and achieves an error rate of $\mathcal{O}\left(\sqrt{\frac{\gamma_T}{T}} + \frac{\gamma_T}{T \varepsilon}\right)$ after $T$ queries for a large class of kernel families, where $\gamma_T$ represents the effective dimensionality of the kernel and $\varepsilon > 0$ is the privacy parameter. Our results are based on a novel estimator for the reward function that simultaneously enjoys high utility along with a low-sensitivity to observed rewards and contexts, which is crucial to obtain an order optimal learning performance with improved dependence on the privacy parameter.
Abstract:We consider the problem of differentially private stochastic convex optimization (DP-SCO) in a distributed setting with $M$ clients, where each of them has a local dataset of $N$ i.i.d. data samples from an underlying data distribution. The objective is to design an algorithm to minimize a convex population loss using a collaborative effort across $M$ clients, while ensuring the privacy of the local datasets. In this work, we investigate the accuracy-communication-privacy trade-off for this problem. We establish matching converse and achievability results using a novel lower bound and a new algorithm for distributed DP-SCO based on Vaidya's plane cutting method. Thus, our results provide a complete characterization of the accuracy-communication-privacy trade-off for DP-SCO in the distributed setting.
Abstract:As the prevalence of mental health challenges, social media has emerged as a key platform for individuals to express their emotions.Deep learning tends to be a promising solution for analyzing mental health on social media. However, black box models are often inflexible when switching between tasks, and their results typically lack explanations. With the rise of large language models (LLMs), their flexibility has introduced new approaches to the field. Also due to the generative nature, they can be prompted to explain decision-making processes. However, their performance on complex psychological analysis still lags behind deep learning. In this paper, we introduce the first multi-task Chinese Social Media Interpretable Mental Health Instructions (C-IMHI) dataset, consisting of 9K samples, which has been quality-controlled and manually validated. We also propose MentalGLM series models, the first open-source LLMs designed for explainable mental health analysis targeting Chinese social media, trained on a corpus of 50K instructions. The proposed models were evaluated on three downstream tasks and achieved better or comparable performance compared to deep learning models, generalized LLMs, and task fine-tuned LLMs. We validated a portion of the generated decision explanations with experts, showing promising results. We also evaluated the proposed models on a clinical dataset, where they outperformed other LLMs, indicating their potential applicability in the clinical field. Our models show strong performance, validated across tasks and perspectives. The decision explanations enhance usability and facilitate better understanding and practical application of the models. Both the constructed dataset and the models are publicly available via: https://github.com/zwzzzQAQ/MentalGLM.
Abstract:Suicide is a pressing global issue, demanding urgent and effective preventive interventions. Among the various strategies in place, psychological support hotlines had proved as a potent intervention method. Approximately two million people in China attempt suicide annually, with many individuals making multiple attempts. Prompt identification and intervention for high-risk individuals are crucial to preventing tragedies. With the rapid advancement of artificial intelligence (AI), especially the development of large-scale language models (LLMs), new technological tools have been introduced to the field of mental health. This study included 1284 subjects, and was designed to validate whether deep learning models and LLMs, using audio and transcribed text from support hotlines, can effectively predict suicide risk. We proposed a simple LLM-based pipeline that first summarizes transcribed text from approximately one hour of speech to extract key features, and then predict suicidial bahaviours in the future. We compared our LLM-based method with the traditional manual scale approach in a clinical setting and with five advanced deep learning models. Surprisingly, the proposed simple LLM pipeline achieved strong performance on a test set of 46 subjects, with an F1 score of 76\% when combined with manual scale rating. This is 7\% higher than the best speech-based deep learning models and represents a 27.82\% point improvement in F1 score compared to using the manual scale apporach alone. Our study explores new applications of LLMs and demonstrates their potential for future use in suicide prevention efforts.
Abstract:Psychological support hotlines are an effective suicide prevention measure that typically relies on professionals using suicide risk assessment scales to predict individual risk scores. However, the accuracy of scale-based predictive methods for suicide risk assessment can vary widely depending on the expertise of the operator. This limitation underscores the need for more reliable methods, prompting this research's innovative exploration of the use of artificial intelligence to improve the accuracy and efficiency of suicide risk prediction within the context of psychological support hotlines. The study included data from 1,549 subjects from 2015-2017 in China who contacted a psychological support hotline. Each participant was followed for 12 months to identify instances of suicidal behavior. We proposed a novel multi-task learning method that uses the large-scale pre-trained model Whisper for feature extraction and fits psychological scales while predicting the risk of suicide. The proposed method yields a 2.4\% points improvement in F1-score compared to the traditional manual approach based on the psychological scales. Our model demonstrated superior performance compared to the other eight popular models. To our knowledge, this study is the first to apply deep learning to long-term speech data to predict suicide risk in China, indicating grate potential for clinical applications. The source code is publicly available at: \url{https://github.com/songchangwei/Suicide-Risk-Prediction}.
Abstract:Cognitive Behavioral Therapy (CBT) is a well-established intervention for mitigating psychological issues by modifying maladaptive cognitive and behavioral patterns. However, delivery of CBT is often constrained by resource limitations and barriers to access. Advancements in artificial intelligence (AI) have provided technical support for the digital transformation of CBT. Particularly, the emergence of pre-training models (PTMs) and large language models (LLMs) holds immense potential to support, augment, optimize and automate CBT delivery. This paper reviews the literature on integrating AI into CBT interventions. We begin with an overview of CBT. Then, we introduce the integration of AI into CBT across various stages: pre-treatment, therapeutic process, and post-treatment. Next, we summarized the datasets relevant to some CBT-related tasks. Finally, we discuss the benefits and current limitations of applying AI to CBT. We suggest key areas for future research, highlighting the need for further exploration and validation of the long-term efficacy and clinical utility of AI-enhanced CBT. The transformative potential of AI in reshaping the practice of CBT heralds a new era of more accessible, efficient, and personalized mental health interventions.
Abstract:BPSO algorithm is a swarm intelligence optimization algorithm, which has the characteristics of good optimization effect, high efficiency and easy to implement. In recent years, it has been used to optimize a variety of machine learning and deep learning models, such as CNN, LSTM, SVM, etc. But it is easy to fall into local optimum for the lack of exploitation ability. It is found that in the article, which is different from previous studies, The reason for the poor performance is an error existing in their velocity update function, which leads to abnormal and chaotic behavior of particles. This not only makes the algorithm difficult to converge, but also often searches the repeated space. So, traditionally, it has to rely on a low w value in the later stage to force these algorithms to converge, but also makes them quickly lose their search ability and prone to getting trapped in local optima. This article proposes a velocity legacy term correction method for all V-shaped BPSOs. Experimentals based on 0/1 knapsack problems show that it has a significant effect on accuracy and efficiency for all of the 4 commonly used V-Shaped BPSOs. Therefore it is an significant breakthrough in the field of swarm intelligence.
Abstract:Affective Forecasting, a research direction in psychology that predicts individuals future emotions, is often constrained by numerous external factors like social influence and temporal distance. To address this, we transform Affective Forecasting into a Deep Learning problem by designing an Emotion Forecasting paradigm based on two-party interactions. We propose a novel Emotion Forecasting (EF) task grounded in the theory that an individuals emotions are easily influenced by the emotions or other information conveyed during interactions with another person. To tackle this task, we have developed a specialized dataset, Human-interaction-based Emotion Forecasting (Hi-EF), which contains 3069 two-party Multilayered-Contextual Interaction Samples (MCIS) with abundant affective-relevant labels and three modalities. Hi-EF not only demonstrates the feasibility of the EF task but also highlights its potential. Additionally, we propose a methodology that establishes a foundational and referential baseline model for the EF task and extensive experiments are provided. The dataset and code is available at https://github.com/Anonymize-Author/Hi-EF.
Abstract:In the field of affective computing, fully leveraging information from a variety of sensory modalities is essential for the comprehensive understanding and processing of human emotions. Inspired by the process through which the human brain handles emotions and the theory of cross-modal plasticity, we propose UMBEnet, a brain-like unified modal affective processing network. The primary design of UMBEnet includes a Dual-Stream (DS) structure that fuses inherent prompts with a Prompt Pool and a Sparse Feature Fusion (SFF) module. The design of the Prompt Pool is aimed at integrating information from different modalities, while inherent prompts are intended to enhance the system's predictive guidance capabilities and effectively manage knowledge related to emotion classification. Moreover, considering the sparsity of effective information across different modalities, the SSF module aims to make full use of all available sensory data through the sparse integration of modality fusion prompts and inherent prompts, maintaining high adaptability and sensitivity to complex emotional states. Extensive experiments on the largest benchmark datasets in the Dynamic Facial Expression Recognition (DFER) field, including DFEW, FERV39k, and MAFW, have proven that UMBEnet consistently outperforms the current state-of-the-art methods. Notably, in scenarios of Modality Missingness and multimodal contexts, UMBEnet significantly surpasses the leading current methods, demonstrating outstanding performance and adaptability in tasks that involve complex emotional understanding with rich multimodal information.