Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel L Oberski

On Text-based Personality Computing: Challenges and Future Directions

Dec 14, 2022

Qixiang Fang, Anastasia Giachanou, Ayoub Bagheri, Laura Boeschoten, Erik-Jan van Kesteren, Mahdi Shafiee Kamalabad, Daniel L Oberski

Abstract:Text-based personality computing (TPC) has gained many research interests in NLP. In this paper, we describe 15 challenges that we consider deserving the attention of the research community. These challenges are organized by the following topics: personality taxonomies, measurement quality, datasets, performance evaluation, modelling choices, as well as ethics and fairness. When addressing each challenge, not only do we combine perspectives from both NLP and social sciences, but also offer concrete suggestions towards more valid and reliable TPC research.

* Added acknowledgements

Via

Access Paper or Ask Questions

Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions

Feb 18, 2022

Qixiang Fang, Dong Nguyen, Daniel L Oberski

Figure 1 for Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions

Figure 2 for Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions

Figure 3 for Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions

Figure 4 for Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions

Abstract:Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to supposedly meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are valid representations of constructs relevant for social science research. We therefore propose the use of the classic construct validity framework to evaluate the validity of text embeddings. We show how this framework can be adapted to the opaque and high-dimensional nature of text embeddings, with application to survey questions. We include several popular text embedding methods (e.g. fastText, GloVe, BERT, Sentence-BERT, Universal Sentence Encoder) in our construct validity analyses. We find evidence of convergent and discriminant validity in some cases. We also show that embeddings can be used to predict respondent's answers to completely new survey questions. Furthermore, BERT-based embedding techniques and the Universal Sentence Encoder provide more valid representations of survey questions than do others. Our results thus highlight the necessity to examine the construct validity of text embeddings before deploying them in social science research.

* Under review

Via

Access Paper or Ask Questions