Abstract:Psychometric measures of ability, attitudes, perceptions, and beliefs are crucial for understanding user behaviors in various contexts including health, security, e-commerce, and finance. Traditionally, psychometric dimensions have been measured and collected using survey-based methods. Inferring such constructs from user-generated text could afford opportunities for timely, unobtrusive, collection and analysis. In this paper, we describe our efforts to construct a corpus for psychometric natural language processing (NLP). We discuss our multi-step process to align user text with their survey-based response items and provide an overview of the resulting testbed which encompasses survey-based psychometric measures and accompanying user-generated text from over 8,500 respondents. We report preliminary results on the use of the text to categorize/predict users' survey response labels. We also discuss the important implications of our work and resulting testbed for future psychometric NLP research.