Abstract:We present an in-depth mechanistic interpretability analysis of training small transformers on an elementary task, counting, which is a crucial deductive step in many algorithms. In particular, we investigate the collaboration/competition among the attention heads: we ask whether the attention heads behave as a pseudo-ensemble, all solving the same subtask, or they perform different subtasks, meaning that they can only solve the original task in conjunction. Our work presents evidence that on the semantics of the counting task, attention heads behave as a pseudo-ensemble, but their outputs need to be aggregated in a non-uniform manner in order to create an encoding that conforms to the syntax. Our source code will be available upon publication.
Abstract:To track trends in the perception of literary translation around the political transformation in 1989 in Hungary, a coding system was developed on the paragraphs of the 1980-1999 issues of the literary journal Alf\"old. This paper describes how we trained BERT models to carry over the coding system to the 1980-1999 issues of the literary journal Nagyvil\'ag. We use extensive hyperparameter tuning, loss functions robust to label unbalance, 10-fold cross-validation for precise evaluations and a model ensemble for prediction, manual validation on the predict set, a new calibration method to better predict label counts for sections of the Nagyvil\'ag corpus, and to study the relations between labels, we construct label relation networks.