Abstract:State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches. We describe a wide range of NLP tasks in terms of two categories: natural language understanding (NLU) (for tasks such as dialect classification, sentiment analysis, parsing, and NLU benchmarks) and natural language generation (NLG) (for summarisation, machine translation, and dialogue systems). The survey is also broad in its coverage of languages which include English, Arabic, German among others. We observe that past work in NLP concerning dialects goes deeper than mere dialect classification, and . This includes early approaches that used sentence transduction that lead to the recent approaches that integrate hypernetworks into LoRA. We expect that this survey will be useful to NLP researchers interested in building equitable language technologies by rethinking LLM benchmarks and model architectures.
Abstract:Chatbots are more and more prevalent in commercial and science contexts. They help customers complain about a product or service or support them to find the best travel deals. Other bots provide mental health support or help book medical appointments. This paper argues that insights into users' language ideologies and their rapport expectations can be used to inform the audience design of the bot's language and interaction patterns and ensure equitable access to the services provided by bots. The argument is underpinned by three kinds of data: simulated user interactions with a chatbot facilitating health appointment bookings, users' introspective comments on their interactions and users' qualitative survey comments post engagement with the booking bot. In closing, I will define audience design for conversational AI and discuss how user-centred analyses of chatbot interactions and sociolinguistically informed theoretical approaches, such as rapport management, can be used to support audience design.