Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Vilares

Better Benchmarking LLMs for Zero-Shot Dependency Parsing

Feb 28, 2025

Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares

Abstract:While LLMs excel in zero-shot tasks, their performance in linguistic challenges like syntactic parsing has been less scrutinized. This paper studies state-of-the-art open-weight LLMs on the task by comparing them to baselines that do not have access to the input sentence, including baselines that have not been used in this context such as random projective trees or optimal linear arrangements. The results show that most of the tested LLMs cannot outperform the best uninformed baselines, with only the newest and largest versions of LLaMA doing so for most languages, and still achieving rather low performance. Thus, accurate zero-shot syntactic parsing is not forthcoming with open LLMs.

* Accepted at NoDaLiDa/Baltic-HLT 2025

Via

Access Paper or Ask Questions

Dependency Graph Parsing as Sequence Labeling

Oct 23, 2024

Ana Ezquerro, David Vilares, Carlos Gómez-Rodríguez

Figure 1 for Dependency Graph Parsing as Sequence Labeling

Figure 2 for Dependency Graph Parsing as Sequence Labeling

Figure 3 for Dependency Graph Parsing as Sequence Labeling

Figure 4 for Dependency Graph Parsing as Sequence Labeling

Abstract:Various linearizations have been proposed to cast syntactic dependency parsing as sequence labeling. However, these approaches do not support more complex graph-based representations, such as semantic dependencies or enhanced universal dependencies, as they cannot handle reentrancy or cycles. By extending them, we define a range of unbounded and bounded linearizations that can be used to cast graph parsing as a tagging task, enlarging the toolbox of problems that can be solved under this paradigm. Experimental results on semantic dependency and enhanced UD parsing show that with a good choice of encoding, sequence-labeling dependency graph parsers combine high efficiency with accuracies close to the state of the art, in spite of their simplicity.

* Accepted at EMNLP-2024

Via

Access Paper or Ask Questions

Dancing in the syntax forest: fast, accurate and explainable sentiment analysis with SALSA

Jun 23, 2024

Carlos Gómez-Rodríguez, Muhammad Imran, David Vilares, Elena Solera, Olga Kellert

Abstract:Sentiment analysis is a key technology for companies and institutions to gauge public opinion on products, services or events. However, for large-scale sentiment analysis to be accessible to entities with modest computational resources, it needs to be performed in a resource-efficient way. While some efficient sentiment analysis systems exist, they tend to apply shallow heuristics, which do not take into account syntactic phenomena that can radically change sentiment. Conversely, alternatives that take syntax into account are computationally expensive. The SALSA project, funded by the European Research Council under a Proof-of-Concept Grant, aims to leverage recently-developed fast syntactic parsing techniques to build sentiment analysis systems that are lightweight and efficient, while still providing accuracy and explainability through the explicit use of syntax. We intend our approaches to be the backbone of a working product of interest for SMEs to use in production.

* Accepted for publication at SEPLN-CEDI2024: Seminar of the Spanish Society for Natural Language Processing at the 7th Spanish Conference on Informatics

Via

Access Paper or Ask Questions

LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal Emotion Linking as Graph-Based Parsing

May 10, 2024

Ana Ezquerro, David Vilares

Abstract:This paper describes our participation in SemEval 2024 Task 3, which focused on Multimodal Emotion Cause Analysis in Conversations. We developed an early prototype for an end-to-end system that uses graph-based methods from dependency parsing to identify causal emotion relations in multi-party conversations. Our model comprises a neural transformer-based encoder for contextualizing multimodal conversation data and a graph-based decoder for generating the adjacency matrix scores of the causal graph. We ranked 7th out of 15 valid and official submissions for Subtask 1, using textual inputs only. We also discuss our participation in Subtask 2 during post-evaluation using multi-modal inputs.

* Accepted at SemEval 2024

Via

Access Paper or Ask Questions

From Partial to Strictly Incremental Constituent Parsing

Feb 05, 2024

Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares

Abstract:We study incremental constituent parsers to assess their capacity to output trees based on prefix representations alone. Guided by strictly left-to-right generative language models and tree-decoding modules, we build parsers that adhere to a strong definition of incrementality across languages. This builds upon work that asserted incrementality, but that mostly only enforced it on either the encoder or the decoder. Finally, we conduct an analysis against non-incremental and partially incremental models.

* Accepted at EACL 2024

Via

Access Paper or Ask Questions

4 and 7-bit Labeling for Projective and Non-Projective Dependency Trees

Oct 22, 2023

Carlos Gómez-Rodríguez, Diego Roca, David Vilares

Abstract:We introduce an encoding for parsing as sequence labeling that can represent any projective dependency tree as a sequence of 4-bit labels, one per word. The bits in each word's label represent (1) whether it is a right or left dependent, (2) whether it is the outermost (left/right) dependent of its parent, (3) whether it has any left children and (4) whether it has any right children. We show that this provides an injective mapping from trees to labels that can be encoded and decoded in linear time. We then define a 7-bit extension that represents an extra plane of arcs, extending the coverage to almost full non-projectivity (over 99.9% empirical arc coverage). Results on a set of diverse treebanks show that our 7-bit encoding obtains substantial accuracy gains over the previously best-performing sequence labeling encodings.

* Accepted for publication at EMNLP 2023

Via

Access Paper or Ask Questions

On the Challenges of Fully Incremental Neural Dependency Parsing

Sep 28, 2023

Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares

Abstract:Since the popularization of BiLSTMs and Transformer-based bidirectional encoders, state-of-the-art syntactic parsers have lacked incrementality, requiring access to the whole sentence and deviating from human language processing. This paper explores whether fully incremental dependency parsing with modern architectures can be competitive. We build parsers combining strictly left-to-right neural encoders with fully incremental sequence-labeling and transition-based decoders. The results show that fully incremental parsing with modern architectures considerably lags behind bidirectional parsing, noting the challenges of psycholinguistically plausible parsing.

* Accepted at IJCNLP-AACL 2023

Via

Access Paper or Ask Questions

Assessment of Pre-Trained Models Across Languages and Grammars

Sep 20, 2023

Alberto Muñoz-Ortiz, David Vilares, Carlos Gómez-Rodríguez

Abstract:We present an approach for assessing how multilingual large language models (LLMs) learn syntax in terms of multi-formalism syntactic structures. We aim to recover constituent and dependency structures by casting parsing as sequence labeling. To do so, we select a few LLMs and study them on 13 diverse UD treebanks for dependency parsing and 10 treebanks for constituent parsing. Our results show that: (i) the framework is consistent across encodings, (ii) pre-trained word vectors do not favor constituency representations of syntax over dependencies, (iii) sub-word tokenization is needed to represent syntax, in contrast to character-based models, and (iv) occurrence of a language in the pretraining data is more important than the amount of task data when recovering syntax from the word vectors.

* Accepted at IJCNLP-AACL 2023

Via

Access Paper or Ask Questions

Contrasting Linguistic Patterns in Human and LLM-Generated Text

Aug 17, 2023

Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, David Vilares

Abstract:We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output from 4 LLMs from the LLaMa family. Our analysis spans several measurable linguistic dimensions, including morphological, syntactic, psychometric and sociolinguistic aspects. The results reveal various measurable differences between human and AI-generated texts. Among others, human texts exhibit more scattered sentence length distributions, a distinct use of dependency and constituent types, shorter constituents, and more aggressive emotions (fear, disgust) than LLM-generated texts. LLM outputs use more numbers, symbols and auxiliaries (suggesting objective language) than human texts, as well as more pronouns. The sexist bias prevalent in human text is also expressed by LLMs.

Via

Access Paper or Ask Questions

Another Dead End for Morphological Tags? Perturbed Inputs and Parsing

May 24, 2023

Alberto Muñoz-Ortiz, David Vilares

Abstract:The usefulness of part-of-speech tags for parsing has been heavily questioned due to the success of word-contextualized parsers. Yet, most studies are limited to coarse-grained tags and high quality written content; while we know little about their influence when it comes to models in production that face lexical errors. We expand these setups and design an adversarial attack to verify if the use of morphological information by parsers: (i) contributes to error propagation or (ii) if on the other hand it can play a role to correct mistakes that word-only neural parsers make. The results on 14 diverse UD treebanks show that under such attacks, for transition- and graph-based models their use contributes to degrade the performance even faster, while for the (lower-performing) sequence labeling parsers they are helpful. We also show that if morphological tags were utopically robust against lexical perturbations, they would be able to correct parsing mistakes.

* Accepted at Findings of ACL 2023

Via

Access Paper or Ask Questions