Abstract:In collaboration with Postpartum Support International (PSI), a non-profit organization dedicated to supporting caregivers with postpartum mood and anxiety disorders, we developed three chatbots to provide context-specific empathetic support to postpartum caregivers, leveraging both rule-based and generative models. We present and evaluate the performance of our chatbots using both machine-based metrics and human-based questionnaires. Overall, our rule-based model achieves the best performance, with outputs that are close to ground truth reference and contain the highest levels of empathy. Human users prefer the rule-based chatbot over the generative chatbot for its context-specific and human-like replies. Our generative chatbot also produced empathetic responses and was described by human users as engaging. However, limitations in the training dataset often result in confusing or nonsensical responses. We conclude by discussing practical benefits of rule-based vs. generative models for supporting individuals with mental health challenges. In light of the recent surge of ChatGPT and BARD, we also discuss the possibilities and pitfalls of large language models for digital mental healthcare.
Abstract:In the domain of social signal processing, audio recognition is a promising avenue for accessing daily behaviors that contribute to health and well-being. However, despite advances in mobile computing and machine learning, audio behavior detection models are largely constrained to data collected in controlled settings, such as call centers. This is problematic as it means their performance is unlikely to generalize to real-world applications. In the current paper, we present a model combining deep spectrum and acoustic features to detect and classify infant distress vocalizations from 24 hour, continuous, raw real-world data collected via a wearable audio recorder. Our model dramatically outperforms infant distress detection models trained and tested on equivalent real-world datasets. In particular, our model has an F1 score of 0.597 relative to F1 scores of 0.166 and 0.26 achieved by state-of-practice and state-of-the-art real-world infant distress classifiers, respectively. We end by discussing what may have facilitated this massive gain in accuracy, including using supervised deep spectrum features and the fact that we collected and annotated a massive dataset of 780 hours of real-world audio data with over 25 hours of labelled distress.
Abstract:It has been suggested in developmental psychology literature that the communication of affect between mothers and their infants correlates with the socioemotional and cognitive development of infants. In this study, we obtained day-long audio recordings of 10 mother-infant pairs in order to study their affect communication in speech with a focus on mother's speech. In order to build a model for speech emotion detection, we used the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and trained a Convolutional Neural Nets model which is able to classify 6 different emotions at 70% accuracy. We applied our model to mother's speech and found the dominant emotions were angry and sad, which were not true. Based on our own observations, we concluded that emotional speech databases made with the help of actors cannot generalize well to real-life settings, suggesting an active learning or unsupervised approach in the future.
Abstract:Acoustic environments vary dramatically within the home setting. They can be a source of comfort and tranquility or chaos that can lead to less optimal cognitive development in children. Research to date has only subjectively measured household chaos. In this work, we use three unsupervised machine learning techniques to quantify household chaos in infants' homes. These unsupervised techniques include hierarchical clustering using K-Means, clustering using self-organizing map (SOM) and deep learning. We evaluated these techniques using data from 9 participants which is a total of 197 hours. Results show that these techniques are promising to quantify household chaos.