Abstract:Mental health issues widely vary across individuals - the manifestations of signs and symptoms can be fairly heterogeneous. Recently, language-based depression and anxiety assessments have shown promise for capturing this heterogeneous nature by evaluating a patient's own language, but such approaches require a large sample of words per person to be accurate. In this work, we introduce adaptive language-based assessment - the task of iteratively estimating an individual's psychological score based on limited language responses to questions that the model also decides to ask. To this end, we explore two statistical learning-based approaches for measurement/scoring: classical test theory (CTT) and item response theory (IRT). We find that using adaptive testing in general can significantly reduce the number of questions required to achieve high validity (r ~ 0.7) with standardized tests, bringing down from 11 total questions down to 3 for depression and 5 for anxiety. Given the combinatorial nature of the problem, we empirically evaluate multiple strategies for both the ordering and scoring objectives, introducing two new methods: a semi-supervised item response theory based method (ALIRT), and a supervised actor-critic based model. While both of the models achieve significant improvements over random and fixed orderings, we find ALIRT to be a scalable model that achieves the highest accuracy with lower numbers of questions (e.g. achieves Pearson r ~ 0.93 after only 3 questions versus asking all 11 questions). Overall, ALIRT allows prompting a reduced number of questions without compromising accuracy or overhead computational costs.