Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abdellah Fourtassi

ILCB, LIS, TALEP

Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction

Dec 13, 2024

Jing Liu, Abdellah Fourtassi

Figure 1 for Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction

Figure 2 for Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction

Figure 3 for Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction

Abstract:LLMs can generate human-like dialogues, yet their ability to simulate early child-adult interactions remains largely unexplored. In this paper, we examined how effectively LLMs can capture the distinctive features of child-caregiver language in interaction, using both static and interactive benchmarking methods. We found that state-of-the-art LLMs like Llama 3 and GPT-4o can approximate child-caregiver dialogues at the word and utterance level, but they struggle to reproduce the child and caregiver's discursive patterns, exaggerate alignment, and fail to reach the level of diversity shown by humans. The broader goal of this work is to initiate the development of a comprehensive benchmark for LLMs in child-oriented applications.

Via

Access Paper or Ask Questions

Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Mar 21, 2024

Mitja Nikolaus, Abhishek Agrawal, Petros Kaklamanis, Alex Warstadt, Abdellah Fourtassi

Figure 1 for Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Figure 2 for Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Figure 3 for Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Figure 4 for Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Abstract:The acquisition of grammar has been a central question to adjudicate between theories of language acquisition. In order to conduct faster, more reproducible, and larger-scale corpus studies on grammaticality in child-caregiver conversations, tools for automatic annotation can offer an effective alternative to tedious manual annotation. We propose a coding scheme for context-dependent grammaticality in child-caregiver conversations and annotate more than 4,000 utterances from a large corpus of transcribed conversations. Based on these annotations, we train and evaluate a range of NLP models. Our results show that fine-tuned Transformer-based models perform best, achieving human inter-annotation agreement levels.As a first application and sanity check of this tool, we use the trained models to annotate a corpus almost two orders of magnitude larger than the manually annotated data and verify that children's grammaticality shows a steady increase with age.This work contributes to the growing literature on applying state-of-the-art NLP methods to help study child language acquisition at scale.

* LREC-Coling 2024, May 2024, Turin, Italy

Via

Access Paper or Ask Questions