Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Margarita Bugueño

GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization

Oct 25, 2024

Margarita Bugueño, Hazem Abou Hamdan, Gerard de Melo

Abstract:Heterogeneous graph neural networks have recently gained attention for long document summarization, modeling the extraction as a node classification task. Although effective, these models often require external tools or additional machine learning models to define graph components, producing highly complex and less intuitive structures. We present GraphLSS, a heterogeneous graph construction for long document extractive summarization, incorporating Lexical, Structural, and Semantic features. It defines two levels of information (words and sentences) and four types of edges (sentence semantic similarity, sentence occurrence order, word in sentence, and word semantic similarity) without any need for auxiliary learning models. Experiments on two benchmark datasets show that GraphLSS is competitive with top-performing graph-based methods, outperforming recent non-graph models. We release our code on GitHub.

* Short paper submitted to ACL ARR November cycle

Via

Access Paper or Ask Questions

Connecting the Dots: What Graph-Based Text Representations Work Best for Text Classification using Graph Neural Networks?

May 23, 2023

Margarita Bugueño, Gerard de Melo

Abstract:Given the success of Graph Neural Networks (GNNs) for structure-aware machine learning, numerous studies have explored their application to text classification, as an alternative to traditional feature representation models. However, most studies considered just a specific domain and validated on data with particular characteristics. This work presents an extensive empirical investigation of graph-based text representation methods proposed for text classification, identifying practical implications and open challenges in the field. We compare several GNN architectures as well as BERT across five datasets, encompassing short and also long documents. The results show that: i) graph performance is highly related to the textual input features and domain, ii) despite its outstanding performance, BERT has difficulties converging when dealing with short texts, iii) graph methods are particularly beneficial for longer documents.

* 17 pages, 2 figures, 15 tables. The Appendix starts on page 12

Via

Access Paper or Ask Questions