Abstract:The clinical diagnosis of most mental disorders primarily relies on the conversations between psychiatrist and patient. The creation of such diagnostic conversation datasets is promising to boost the AI mental healthcare community. However, directly collecting the conversations in real diagnosis scenarios is near impossible due to stringent privacy and ethical considerations. To address this issue, we seek to synthesize diagnostic conversation by exploiting anonymous patient cases that are easier to access. Specifically, we design a neuro-symbolic multi-agent framework for synthesizing the diagnostic conversation of mental disorders with large language models. It takes patient case as input and is capable of generating multiple diverse conversations with one single patient case. The framework basically involves the interaction between a doctor agent and a patient agent, and achieves text generation under symbolic control via a dynamic diagnosis tree from a tool agent. By applying the proposed framework, we develop the largest Chinese mental disorders diagnosis dataset MDD-5k, which is built upon 1000 cleaned real patient cases by cooperating with a pioneering psychiatric hospital, and contains 5000 high-quality long conversations with diagnosis results as labels. To the best of our knowledge, it's also the first labelled Chinese mental disorders diagnosis dataset. Human evaluation demonstrates the proposed MDD-5k dataset successfully simulates human-like diagnostic process of mental disorders. The dataset and code will become publicly accessible in https://github.com/lemonsis/MDD-5k.
Abstract:Reduced-order modelling and low-dimensional surrogate models generated using machine learning algorithms have been widely applied in high-dimensional dynamical systems to improve the algorithmic efficiency. In this paper, we develop a system which combines reduced-order surrogate models with a novel data assimilation (DA) technique used to incorporate real-time observations from different physical spaces. We make use of local smooth surrogate functions which link the space of encoded system variables and the one of current observations to perform variational DA with a low computational cost. The new system, named Generalised Latent Assimilation can benefit both the efficiency provided by the reduced-order modelling and the accuracy of data assimilation. A theoretical analysis of the difference between surrogate and original assimilation cost function is also provided in this paper where an upper bound, depending on the size of the local training set, is given. The new approach is tested on a high-dimensional CFD application of a two-phase liquid flow with non-linear observation operators that current Latent Assimilation methods can not handle. Numerical results demonstrate that the proposed assimilation approach can significantly improve the reconstruction and prediction accuracy of the deep learning surrogate model which is nearly 1000 times faster than the CFD simulation.
Abstract:This study is motivated by the magnitude of the problem of Louisiana high school dropout and its negative impacts on individual and public well-being. Our goal is to predict students who are at risk of high school dropout, by examining Louisiana administrative dataset. Due to the imbalanced nature of the dataset, imbalanced learning techniques including resampling, case weighting, and cost-sensitive learning have been applied to enhance the prediction performance on the rare class. Performance metrics used in this study are F-measure, recall and precision of the rare class. We compare the performance of several machine learning algorithms such as neural networks, decision trees and bagging trees in combination with the imbalanced learning approaches using an administrative dataset of size of 366k+ from Louisiana Department of Education. Experiments show that application of imbalanced learning methods produces good results on recall but decreases precision, whereas base classifiers without regard of imbalanced data handling gives better precision but poor recall. Overall application of imbalanced learning techniques is beneficial, yet more studies are desired to improve precision.
Abstract:Electroencephalography (EEG) has become the most significant input signal for brain computer interface (BCI) based systems. However, it is very difficult to obtain satisfactory classification accuracy due to traditional methods can not fully exploit multimodal information. Herein, we propose a novel approach to modeling cognitive events from EEG data by reducing it to a video classification problem, which is designed to preserve the multimodal information of EEG. In addition, optical flow is introduced to represent the variant information of EEG. We train a deep neural network (DNN) with convolutional neural network (CNN) and recurrent neural network (RNN) for the EEG classification task by using EEG video and optical flow. The experiments demonstrate that our approach has many advantages, such as more robustness and more accuracy in EEG classification tasks. According to our approach, we designed a mixed BCI-based rehabilitation support system to help stroke patients perform some basic operations.