Abstract:Handwriting literacy is an important skill for learning and communication in school-age children. In the digital age, handwriting has been largely replaced by typing, leading to a decline in handwriting proficiency, particularly in non-alphabetic writing systems. Among children learning Chinese, a growing number have reported experiencing character amnesia: difficulty in correctly handwriting a character despite being able to recognize it. Given that there is currently no standardized diagnostic tool for assessing character amnesia in children, we developed an assessment to measure Chinese character amnesia in Mandarin-speaking school-age population. We utilised a large-scale handwriting dataset in which 40 children handwrote 800 characters from dictation prompts. Character amnesia and correct handwriting responses were analysed using a two-parameter Item Response Theory model. Four item-selection schemes were compared: random baseline, maximum discrimination, diverse difficulty, and an upper-and-lower-thirds discrimination score. Candidate item subsets were evaluated using out-of-sample prediction. Among these selection schemes, the upper-and-lower-thirds discrimination procedure yields a compact 30-item test that preserves individual-difference structure and generalizes to unseen test-takers (cross-validated mean r =.74 with full 800-item-test; within-sample r =.93). This short-form test provides a reliable and efficient tool of assessing Chinese character amnesia in children and can be used to identify early handwriting and orthographic learning difficulties, contributing to the early detection of developmental dysgraphia and related literacy challenges.




Abstract:Voice-based AI development faces unique challenges in processing both linguistic and paralinguistic information. This study compares how large audio-language models (LALMs) and humans integrate speaker characteristics during speech comprehension, asking whether LALMs process speaker-contextualized language in ways that parallel human cognitive mechanisms. We compared two LALMs' (Qwen2-Audio and Ultravox 0.5) processing patterns with human EEG responses. Using surprisal and entropy metrics from the models, we analyzed their sensitivity to speaker-content incongruency across social stereotype violations (e.g., a man claiming to regularly get manicures) and biological knowledge violations (e.g., a man claiming to be pregnant). Results revealed that Qwen2-Audio exhibited increased surprisal for speaker-incongruent content and its surprisal values significantly predicted human N400 responses, while Ultravox 0.5 showed limited sensitivity to speaker characteristics. Importantly, neither model replicated the human-like processing distinction between social violations (eliciting N400 effects) and biological violations (eliciting P600 effects). These findings reveal both the potential and limitations of current LALMs in processing speaker-contextualized language, and suggest differences in social-linguistic processing mechanisms between humans and LALMs.