Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andras Kornai

Morphosyntactic probing of multilingual BERT models

Jun 09, 2023

Judit Acs, Endre Hamerlik, Roy Schwartz, Noah A. Smith, Andras Kornai

Figure 1 for Morphosyntactic probing of multilingual BERT models

Figure 2 for Morphosyntactic probing of multilingual BERT models

Figure 3 for Morphosyntactic probing of multilingual BERT models

Figure 4 for Morphosyntactic probing of multilingual BERT models

Abstract:We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived from the Universal Dependencies treebanks. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain strong performance across these tasks. We then apply two methods to locate, for each probing task, where the disambiguating information resides in the input. The first is a new perturbation method that masks various parts of context; the second is the classical method of Shapley values. The most intriguing finding that emerges is a strong tendency for the preceding context to hold more information relevant to the prediction than the following context.

* to appear in the Journal of Natural Language Engineering

Via

Access Paper or Ask Questions

The Role of Interpretable Patterns in Deep Learning for Morphology

Dec 08, 2020

Judit Acs, Andras Kornai

Figure 1 for The Role of Interpretable Patterns in Deep Learning for Morphology

Figure 2 for The Role of Interpretable Patterns in Deep Learning for Morphology

Figure 3 for The Role of Interpretable Patterns in Deep Learning for Morphology

Figure 4 for The Role of Interpretable Patterns in Deep Learning for Morphology

Abstract:We examine the role of character patterns in three tasks: morphological analysis, lemmatization and copy. We use a modified version of the standard sequence-to-sequence model, where the encoder is a pattern matching network. Each pattern scores all possible N character long subwords (substrings) on the source side, and the highest scoring subword's score is used to initialize the decoder as well as the input to the attention mechanism. This method allows learning which subwords of the input are important for generating the output. By training the models on the same source but different target, we can compare what subwords are important for different tasks and how they relate to each other. We define a similarity metric, a generalized form of the Jaccard similarity, and assign a similarity score to each pair of the three tasks that work on the same source but may differ in target. We examine how these three tasks are related to each other in 12 languages. Our code is publicly available.

* XVI. Magyar Sz\'am\'it\'og\'epes Nyelv\'eszeti Konferencia, 2020, page 171-179 (MSZNY2020)
* Best paper at the Hungarian NLP conference (MSZNY2020)

Via

Access Paper or Ask Questions

Naive probability

May 20, 2019

Zalan Gyenis, Andras Kornai

Abstract:We describe a rational, but low resolution model of probability.

* 8 pages

Via

Access Paper or Ask Questions