Sherman
Abstract:The 1st SpeechWellness Challenge (SW1) aims to advance methods for detecting suicidal risk in adolescents using speech analysis techniques. Suicide among adolescents is a critical public health issue globally. Early detection of suicidal tendencies can lead to timely intervention and potentially save lives. Traditional methods of assessment often rely on self-reporting or clinical interviews, which may not always be accessible. The SW1 challenge addresses this gap by exploring speech as a non-invasive and readily available indicator of mental health. We release the SW1 dataset which contains speech recordings from 600 adolescents aged 10-18 years. By focusing on speech generated from natural tasks, the challenge seeks to uncover patterns and markers that correlate with suicidal risk.
Abstract:Objective: Predicting children's future levels of externalizing problems helps to identify children at risk and guide targeted prevention. Existing studies have shown that mothers providing support in response to children's dysregulation was associated with children's lower levels of externalizing problems. The current study aims to evaluate and improve the accuracy of predicting children's externalizing problems with mother-child interaction dynamics. Method: This study used mother-child interaction dynamics during a challenging puzzle task to predict children's externalizing problems six months later (N=101, 46 boys, Mage=57.41 months, SD=6.58). Performance of the Residual Dynamic Structural Equation Model (RDSEM) was compared with the Attention-based Sequential Behavior Interaction Modeling (ASBIM) model, developed using the deep learning techniques. Results: The RDSEM revealed that children whose mothers provided more autonomy support after increases of child defeat had lower levels of externalizing problems. Five-fold cross-validation showed that the RDSEM had good prediction accuracy. The ASBIM model further improved prediction accuracy, especially after including child inhibitory control as a personalized individual feature. Conclusions: The dynamic process of mother-child interaction provides important information for predicting children's externalizing problems, especially maternal autonomy supportive response to child defeat. The deep learning model is a useful tool to further improve prediction accuracy.
Abstract:In this paper, we propose a novel dependency-aware task scheduling strategy for dynamic unmanned aerial vehicle-assisted connected autonomous vehicles (CAVs). Specifically, different computation tasks of CAVs consisting of multiple dependency subtasks are judiciously assigned to nearby CAVs or the base station for promptly completing tasks. Therefore, we formulate a joint scheduling priority and subtask assignment optimization problem with the objective of minimizing the average task completion time. The problem aims at improving the long-term system performance, which is reformulated as a Markov decision process. To solve the problem, we further propose a diffusion-based reinforcement learning algorithm, named Synthetic DDQN based Subtasks Scheduling, which can make adaptive task scheduling decision in real time. A diffusion model-based synthetic experience replay is integrated into the reinforcement learning framework, which can generate sufficient synthetic data in experience replay buffer, thereby significantly accelerating convergence and improving sample efficiency. Simulation results demonstrate the effectiveness of the proposed algorithm on reducing task completion time, comparing to benchmark schemes.
Abstract:In this paper, we investigate a resource allocation and model retraining problem for dynamic wireless networks by utilizing incremental learning, in which the digital twin (DT) scheme is employed for decision making. A two-timescale framework is proposed for computation resource allocation, mobile user association, and incremental training of user models. To obtain an optimal resource allocation and incremental learning policy, we propose an efficient two-timescale scheme based on hybrid DT-physical architecture with the objective to minimize long-term system delay. Specifically, in the large-timescale, base stations will update the user association and implement incremental learning decisions based on statistical state information from the DT system. Then, in the short timescale, an effective computation resource allocation and incremental learning data generated from the DT system is designed based on deep reinforcement learning (DRL), thus reducing the network system's delay in data transmission, data computation, and model retraining steps. Simulation results demonstrate the effectiveness of the proposed two-timescale scheme compared with benchmark schemes.
Abstract:In this paper, we propose a novel road side unit (RSU)-assisted cooperative sensing scheme for connected autonomous vehicles (CAVs), with the objective to reduce completion time of sensing tasks. Specifically, LiDAR sensing data of both RSU and CAVs are selectively fused to improve sensing accuracy, and computing resources therein are cooperatively utilized to process tasks in real time. To this end, for each task, we decide whether to compute it at the CAV or at the RSU and allocate resources accordingly. We first formulate a joint task placement and resource allocation problem for minimizing the total task completion time while satisfying sensing accuracy constraint. We then decouple the problem into two subproblems and propose a two-layer algorithm to solve them. The outer layer first makes task placement decision based on the Gibbs sampling theory, while the inner layer makes spectrum and computing resource allocation decisions via greedy-based and convex optimization subroutines, respectively. Simulation results based on the autonomous driving simulator CARLA demonstrate the effectiveness of the proposed scheme in reducing total task completion time, comparing to benchmark schemes.
Abstract:In this paper, we study a vehicle selection problem for federated learning (FL) over vehicular networks. Specifically, we design a mobility-aware vehicular federated learning (MAVFL) scheme in which vehicles drive through a road segment to perform FL. Some vehicles may drive out of the segment which leads to unsuccessful training. In the proposed scheme, the real-time successful training participation ratio is utilized to implement vehicle selection. We conduct the convergence analysis to indicate the influence of vehicle mobility on training loss. Furthermore, we propose a multi-armed bandit-based vehicle selection algorithm to minimize the utility function considering training loss and delay. The simulation results show that compared with baselines, the proposed algorithm can achieve better training performance with approximately 28\% faster convergence.
Abstract:Speech-based automatic detection of Alzheimer's disease (AD) and depression has attracted increased attention. Confidence estimation is crucial for a trust-worthy automatic diagnostic system which informs the clinician about the confidence of model predictions and helps reduce the risk of misdiagnosis. This paper investigates confidence estimation for automatic detection of AD and depression based on clinical interviews. A novel Bayesian approach is proposed which uses a dynamic Dirichlet prior distribution to model the second-order probability of the predictive distribution. Experimental results on the publicly available ADReSS and DAIC-WOZ datasets demonstrate that the proposed method outperforms a range of baselines for both classification accuracy and confidence estimation.
Abstract:In this letter, we investigate the channel estimation problem for MIMO wireless communication systems with movable antennas (MAs) at both the transmitter (Tx) and receiver (Rx). To achieve high channel estimation accuracy with low pilot training overhead, we propose a tensor decomposition-based method for estimating the parameters of multi-path channel components, including their azimuth and elevation angles, as well as complex gain coefficients, thereby reconstructing the wireless channel between any pair of Tx and Rx MA positions in the Tx and Rx regions. First, we introduce a two-stage Tx-Rx successive antenna movement pattern for pilot training, such that the received pilot signals in both stages can be expressed as a third-order tensor. Then, we obtain the factor matrices of the tensor via the canonical polyadic decomposition, and thereby estimate the angle/gain parameters for enabling the channel reconstruction between arbitrary Tx/Rx MA positions. In addition, we analyze the uniqueness condition of the tensor decomposition, which ensures the complete channel reconstruction between the whole Tx and Rx regions based on the channel measurements at only a finite number of Tx/Rx MA positions. Finally, simulation results are presented to evaluate the proposed tensor decomposition-based method as compared to existing methods, in terms of channel estimation accuracy and pilot overhead.
Abstract:The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse acoustic and linguistic features embedded in spontaneous speech, both the Whisper speech model and textual large language models (LLMs) are used for suicide risk detection. Both all-parameter finetuning and parameter-efficient finetuning approaches are used to adapt the pre-trained models for suicide risk detection, and multiple audio-text fusion approaches are evaluated to combine the representations of Whisper and the LLM. The proposed system achieves a detection accuracy of 0.807 and an F1-score of 0.846 on the test set with 119 subjects, indicating promising potential for real suicide risk detection applications.
Abstract:In recent years, text-to-speech (TTS) technology has witnessed impressive advancements, particularly with large-scale training datasets, showcasing human-level speech quality and impressive zero-shot capabilities on unseen speakers. However, despite human subjective evaluations, such as the mean opinion score (MOS), remaining the gold standard for assessing the quality of synthetic speech, even state-of-the-art TTS approaches have kept human feedback isolated from training that resulted in mismatched training objectives and evaluation metrics. In this work, we investigate a novel topic of integrating subjective human evaluation into the TTS training loop. Inspired by the recent success of reinforcement learning from human feedback, we propose a comprehensive sampling-annotating-learning framework tailored to TTS optimization, namely uncertainty-aware optimization (UNO). Specifically, UNO eliminates the need for a reward model or preference data by directly maximizing the utility of speech generations while considering the uncertainty that lies in the inherent variability in subjective human speech perception and evaluations. Experimental results of both subjective and objective evaluations demonstrate that UNO considerably improves the zero-shot performance of TTS models in terms of MOS, word error rate, and speaker similarity. Additionally, we present a remarkable ability of UNO that it can adapt to the desired speaking style in emotional TTS seamlessly and flexibly.