Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lisa Green

Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

Sep 15, 2022

Tessa Masis, Anissa Neal, Lisa Green, Brendan O'Connor

Figure 1 for Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

Figure 2 for Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

Figure 3 for Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

Figure 4 for Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

Abstract:The study of language variation examines how language varies between and within different groups of speakers, shedding light on how we use language to construct identities and how social contexts affect language use. A common method is to identify instances of a certain linguistic feature - say, the zero copula construction - in a corpus, and analyze the feature's distribution across speakers, topics, and other variables, to either gain a qualitative understanding of the feature's function or systematically measure variation. In this paper, we explore the challenging task of automatic morphosyntactic feature detection in low-resource English varieties. We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits. We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.

* Field Matters Workshop at COLING 2022

Via

Access Paper or Ask Questions

Demographic Dialectal Variation in Social Media: A Case Study of African-American English

Aug 31, 2016

Su Lin Blodgett, Lisa Green, Brendan O'Connor

Figure 1 for Demographic Dialectal Variation in Social Media: A Case Study of African-American English

Figure 2 for Demographic Dialectal Variation in Social Media: A Case Study of African-American English

Figure 3 for Demographic Dialectal Variation in Social Media: A Case Study of African-American English

Figure 4 for Demographic Dialectal Variation in Social Media: A Case Study of African-American English

Abstract:Though dialectal language is increasingly abundant on social media, few resources exist for developing NLP tools to handle such language. We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter. We propose a distantly supervised model to identify AAE-like language from demographics associated with geo-located messages, and we verify that this language follows well-known AAE linguistic phenomena. In addition, we analyze the quality of existing language identification and dependency parsing tools on AAE-like text, demonstrating that they perform poorly on such text compared to text associated with white speakers. We also provide an ensemble classifier for language identification which eliminates this disparity and release a new corpus of tweets containing AAE-like language.

* To be published in EMNLP 2016, 15 pages

Via

Access Paper or Ask Questions