Northeastern University
Abstract:Longitudinal passive sensing studies for health and behavior outcomes often have missing and incomplete data. Handling missing data effectively is thus a critical data processing and modeling step. Our formative interviews with researchers working in longitudinal health and behavior passive sensing revealed a recurring theme: most researchers consider imputation a low-priority step in their analysis and inference pipeline, opting to use simple and off-the-shelf imputation strategies without comprehensively evaluating its impact on study outcomes. Through this paper, we call attention to the importance of imputation. Using publicly available passive sensing datasets for depression, we show that prioritizing imputation can significantly impact the study outcomes -- with our proposed imputation strategies resulting in up to 31% improvement in AUROC to predict depression over the original imputation strategy. We conclude by discussing the challenges and opportunities with effective imputation in longitudinal sensing studies.
Abstract:Many hyper-personalized AI systems profile people's characteristics (e.g., personality traits) to provide personalized recommendations. These systems are increasingly used to facilitate interactions among people, such as providing teammate recommendations. Despite improved accuracy, such systems are not immune to errors when making inferences about people's most personal traits. These errors manifested as AI misrepresentations. However, the repercussions of such AI misrepresentations are unclear, especially on people's reactions and perceptions of the AI. We present two studies to examine how people react and perceive the AI after encountering personality misrepresentations in AI-facilitated team matching in a higher education context. Through semi-structured interviews (n=20) and a survey experiment (n=198), we pinpoint how people's existing and newly acquired AI knowledge could shape their perceptions and reactions of the AI after encountering AI misrepresentations. Specifically, we identified three rationales that people adopted through knowledge acquired from AI (mis)representations: AI works like a machine, human, and/or magic. These rationales are highly connected to people's reactions of over-trusting, rationalizing, and forgiving of AI misrepresentations. Finally, we found that people's existing AI knowledge, i.e., AI literacy, could moderate people's changes in their trust in AI after encountering AI misrepresentations, but not changes in people's social perceptions of AI. We discuss the role of people's AI knowledge when facing AI fallibility and implications for designing responsible mitigation and repair strategies.
Abstract:Advances in mobile and wearable technologies have enabled the potential to passively monitor a person's mental, behavioral, and affective health. These approaches typically rely on longitudinal collection of self-reported outcomes, e.g., depression, stress, and anxiety, to train machine learning (ML) models. However, the need to continuously self-report adds a significant burden on the participants, often resulting in attrition, missing labels, or insincere responses. In this work, we introduce the Scale Scores Simulation using Mental Models (SeSaMe) framework to alleviate participants' burden in digital mental health studies. By leveraging pre-trained large language models (LLMs), SeSaMe enables the simulation of participants' responses on psychological scales. In SeSaMe, researchers can prompt LLMs with information on participants' internal behavioral dispositions, enabling LLMs to construct mental models of participants to simulate their responses on psychological scales. We demonstrate an application of SeSaMe, where we use GPT-4 to simulate responses on one scale using responses from another as behavioral information. We also evaluate the alignment between human and SeSaMe-simulated responses to psychological scales. Then, we present experiments to inspect the utility of SeSaMe-simulated responses as ground truth in training ML models by replicating established depression and anxiety screening tasks from a previous study. Our results indicate SeSaMe to be a promising approach, but its alignment may vary across scales and specific prediction objectives. We also observed that model performance with simulated data was on par with using the real data for training in most evaluation scenarios. We conclude by discussing the potential implications of SeSaMe in addressing some challenges researchers face with ground-truth collection in passive sensing studies.