Abstract:Handwriting literacy is an important skill for learning and communication in school-age children. In the digital age, handwriting has been largely replaced by typing, leading to a decline in handwriting proficiency, particularly in non-alphabetic writing systems. Among children learning Chinese, a growing number have reported experiencing character amnesia: difficulty in correctly handwriting a character despite being able to recognize it. Given that there is currently no standardized diagnostic tool for assessing character amnesia in children, we developed an assessment to measure Chinese character amnesia in Mandarin-speaking school-age population. We utilised a large-scale handwriting dataset in which 40 children handwrote 800 characters from dictation prompts. Character amnesia and correct handwriting responses were analysed using a two-parameter Item Response Theory model. Four item-selection schemes were compared: random baseline, maximum discrimination, diverse difficulty, and an upper-and-lower-thirds discrimination score. Candidate item subsets were evaluated using out-of-sample prediction. Among these selection schemes, the upper-and-lower-thirds discrimination procedure yields a compact 30-item test that preserves individual-difference structure and generalizes to unseen test-takers (cross-validated mean r =.74 with full 800-item-test; within-sample r =.93). This short-form test provides a reliable and efficient tool of assessing Chinese character amnesia in children and can be used to identify early handwriting and orthographic learning difficulties, contributing to the early detection of developmental dysgraphia and related literacy challenges.




Abstract:Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (https://osf.io/t5nes) presents the first large-scale investigation of ChatGPT's grammatical intuition, building upon a previous study that collected laypeople's grammatical judgments on 148 linguistic phenomena that linguists judged to be grammatical, ungrammatical, or marginally grammatical (Sprouse, Schutze, & Almeida, 2013). Our primary focus was to compare ChatGPT with both laypeople and linguists in the judgement of these linguistic constructions. In Experiment 1, ChatGPT assigned ratings to sentences based on a given reference sentence. Experiment 2 involved rating sentences on a 7-point scale, and Experiment 3 asked ChatGPT to choose the more grammatical sentence from a pair. Overall, our findings demonstrate convergence rates ranging from 73% to 95% between ChatGPT and linguists, with an overall point-estimate of 89%. Significant correlations were also found between ChatGPT and laypeople across all tasks, though the correlation strength varied by task. We attribute these results to the psychometric nature of the judgment tasks and the differences in language processing styles between humans and LLMs.