Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multilingual Nonce Dependency Treebanks: Understanding how LLMs represent and process syntactic structure

Nov 13, 2023

David Arps, Laura Kallmeyer, Younes Samih, Hassan Sajjad

Figure 1 for Multilingual Nonce Dependency Treebanks: Understanding how LLMs represent and process syntactic structure

Figure 2 for Multilingual Nonce Dependency Treebanks: Understanding how LLMs represent and process syntactic structure

Figure 3 for Multilingual Nonce Dependency Treebanks: Understanding how LLMs represent and process syntactic structure

Figure 4 for Multilingual Nonce Dependency Treebanks: Understanding how LLMs represent and process syntactic structure

Share this with someone who'll enjoy it:

Abstract:We introduce SPUD (Semantically Perturbed Universal Dependencies), a framework for creating nonce treebanks for the multilingual Universal Dependencies (UD) corpora. SPUD data satisfies syntactic argument structure, provides syntactic annotations, and ensures grammaticality via language-specific rules. We create nonce data in Arabic, English, French, German, and Russian, and demonstrate two use cases of SPUD treebanks. First, we investigate the effect of nonce data on word co-occurrence statistics, as measured by perplexity scores of autoregressive (ALM) and masked language models (MLM). We find that ALM scores are significantly more affected by nonce data than MLM scores. Second, we show how nonce data affects the performance of syntactic dependency probes. We replicate the findings of M\"uller-Eberstein et al. (2022) on nonce test data and show that the performance declines on both MLMs and ALMs wrt. original test data. However, a majority of the performance is kept, suggesting that the probe indeed learns syntax independently from semantics.

* Our software is available at https://github.com/davidarps/spud

View paper on

Share this with someone who'll enjoy it:

Title:Multilingual Nonce Dependency Treebanks: Understanding how LLMs represent and process syntactic structure

Paper and Code