Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexander Ertl

mCPT at SemEval-2023 Task 3: Multilingual Label-Aware Contrastive Pre-Training of Transformers for Few- and Zero-shot Framing Detection

Mar 17, 2023

Markus Reiter-Haas, Alexander Ertl, Kevin Innerhofer, Elisabeth Lex

Abstract:This paper presents the winning system for the zero-shot Spanish framing detection task, which also achieves competitive places in eight additional languages. The challenge of the framing detection task lies in identifying a set of 14 frames when only a few or zero samples are available, i.e., a multilingual multi-label few- or zero-shot setting. Our developed solution employs a pre-training procedure based on multilingual Transformers using a label-aware contrastive loss function. In addition to describing the system, we perform an embedding space analysis and ablation study to demonstrate how our pre-training procedure supports framing detection to advance computational framing analysis.

* Manuscript submitted for publication at SemEval'23

Via

Access Paper or Ask Questions

Gradient-based Weight Density Balancing for Robust Dynamic Sparse Training

Nov 03, 2022

Mathias Parger, Alexander Ertl, Paul Eibensteiner, Joerg H. Mueller, Martin Winter, Markus Steinberger

Abstract:Training a sparse neural network from scratch requires optimizing connections at the same time as the weights themselves. Typically, the weights are redistributed after a predefined number of weight updates, removing a fraction of the parameters of each layer and inserting them at different locations in the same layers. The density of each layer is determined using heuristics, often purely based on the size of the parameter tensor. While the connections per layer are optimized multiple times during training, the density of each layer remains constant. This leaves great unrealized potential, especially in scenarios with a high sparsity of 90% and more. We propose Global Gradient-based Redistribution, a technique which distributes weights across all layers - adding more weights to the layers that need them most. Our evaluation shows that our approach is less prone to unbalanced weight distribution at initialization than previous work and that it is able to find better performing sparse subnetworks at very high sparsity levels.

Via

Access Paper or Ask Questions