Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Murathan Kurfalı

Climate-Eval: A Comprehensive Benchmark for NLP Tasks Related to Climate Change

May 24, 2025

Murathan Kurfalı, Shorouq Zahra, Joakim Nivre, Gabriele Messori

Abstract:Climate-Eval is a comprehensive benchmark designed to evaluate natural language processing models across a broad range of tasks related to climate change. Climate-Eval aggregates existing datasets along with a newly developed news classification dataset, created specifically for this release. This results in a benchmark of 25 tasks based on 13 datasets, covering key aspects of climate discourse, including text classification, question answering, and information extraction. Our benchmark provides a standardized evaluation suite for systematically assessing the performance of large language models (LLMs) on these tasks. Additionally, we conduct an extensive evaluation of open-source LLMs (ranging from 2B to 70B parameters) in both zero-shot and few-shot settings, analyzing their strengths and limitations in the domain of climate change.

* Accepted to ClimateNLP 2025@ACL

Via

Access Paper or Ask Questions

Lightweight Connective Detection Using Gradient Boosting

Apr 21, 2024

Mustafa Erolcan Er, Murathan Kurfalı, Deniz Zeyrek

Abstract:In this work, we introduce a lightweight discourse connective detection system. Employing gradient boosting trained on straightforward, low-complexity features, this proposed approach sidesteps the computational demands of the current approaches that rely on deep neural networks. Considering its simplicity, our approach achieves competitive results while offering significant gains in terms of time even on CPU. Furthermore, the stable performance across two unrelated languages suggests the robustness of our system in the multilingual scenario. The model is designed to support the annotation of discourse relations, particularly in scenarios with limited resources, while minimizing performance loss.

* 7 pages, 2 figures, 5 tables

Via

Access Paper or Ask Questions

Evaluation of really good grammatical error correction

Aug 17, 2023

Robert Östling, Katarina Gillholm, Murathan Kurfalı, Marie Mattson, Mats Wirén

Abstract:Although rarely stated, in practice, Grammatical Error Correction (GEC) encompasses various models with distinct objectives, ranging from grammatical error detection to improving fluency. Traditional evaluation methods fail to fully capture the full range of system capabilities and objectives. Reference-based evaluations suffer from limitations in capturing the wide variety of possible correction and the biases introduced during reference creation and is prone to favor fixing local errors over overall text improvement. The emergence of large language models (LLMs) has further highlighted the shortcomings of these evaluation strategies, emphasizing the need for a paradigm shift in evaluation methodology. In the current study, we perform a comprehensive evaluation of various GEC systems using a recently published dataset of Swedish learner texts. The evaluation is performed using established evaluation metrics as well as human judges. We find that GPT-3 in a few-shot setting by far outperforms previous grammatical error correction systems for Swedish, a language comprising only 0.11% of its training data. We also found that current evaluation methods contain undesirable biases that a human evaluation is able to reveal. We suggest using human post-editing of GEC system outputs to analyze the amount of change required to reach native-level human performance on the task, and provide a dataset annotated with human post-edits and assessments of grammaticality, fluency and meaning preservation of GEC system outputs.

Via

Access Paper or Ask Questions

Language Embeddings Sometimes Contain Typological Generalizations

Jan 19, 2023

Robert Östling, Murathan Kurfalı

Abstract:To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned? We explore these questions by training neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1295 languages. The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features obtained through annotation projection. We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most of our models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations. Careful attention to details in the evaluation turns out to be essential to avoid false positives. Furthermore, to encourage continued work in this field, we release several resources covering most or all of the languages in our data: (i) multiple sets of language representations, (ii) multilingual word embeddings, (iii) projected and predicted syntactic and morphological features, (iv) software to provide linguistically sound evaluations of language representations.

Via

Access Paper or Ask Questions

Probing Multilingual Language Models for Discourse

Jun 09, 2021

Murathan Kurfalı, Robert Östling

Figure 1 for Probing Multilingual Language Models for Discourse

Figure 2 for Probing Multilingual Language Models for Discourse

Figure 3 for Probing Multilingual Language Models for Discourse

Figure 4 for Probing Multilingual Language Models for Discourse

Abstract:Pre-trained multilingual language models have become an important building block in multilingual natural language processing. In the present paper, we investigate a range of such models to find out how well they transfer discourse-level knowledge across languages. This is done with a systematic evaluation on a broader set of discourse-level tasks than has been previously been assembled. We find that the XLM-RoBERTa family of models consistently show the best performance, by simultaneously being good monolingual models and degrading relatively little in a zero-shot setting. Our results also indicate that model distillation may hurt the ability of cross-lingual transfer of sentence representations, while language dissimilarity at most has a modest effect. We hope that our test suite, covering 5 tasks with a total of 22 languages in 10 distinct families, will serve as a useful evaluation platform for multilingual performance at and beyond the sentence level.

* To be presented at RepL4NLP 2021

Via

Access Paper or Ask Questions

Let's be explicit about that: Distant supervision for implicit discourse relation classification via connective prediction

Jun 06, 2021

Murathan Kurfalı, Robert Östling

Figure 1 for Let's be explicit about that: Distant supervision for implicit discourse relation classification via connective prediction

Figure 2 for Let's be explicit about that: Distant supervision for implicit discourse relation classification via connective prediction

Figure 3 for Let's be explicit about that: Distant supervision for implicit discourse relation classification via connective prediction

Figure 4 for Let's be explicit about that: Distant supervision for implicit discourse relation classification via connective prediction

Abstract:In implicit discourse relation classification, we want to predict the relation between adjacent sentences in the absence of any overt discourse connectives. This is challenging even for humans, leading to shortage of annotated data, a fact that makes the task even more difficult for supervised machine learning approaches. In the current study, we perform implicit discourse relation classification without relying on any labeled implicit relation. We sidestep the lack of data through explicitation of implicit relations to reduce the task to two sub-problems: language modeling and explicit discourse relation classification, a much easier problem. Our experimental results show that this method can even marginally outperform the state-of-the-art, in spite of being much simpler than alternative models of comparable performance. Moreover, we show that the achieved performance is robust across domains as suggested by the zero-shot experiments on a completely different domain. This indicates that recent advances in language modeling have made language models sufficiently good at capturing inter-sentence relations without the help of explicit discourse markers.

* To be presented at Unimplicit 2021

Via

Access Paper or Ask Questions

Labeling Explicit Discourse Relations using Pre-trained Language Models

Jun 21, 2020

Murathan Kurfalı

Figure 1 for Labeling Explicit Discourse Relations using Pre-trained Language Models

Figure 2 for Labeling Explicit Discourse Relations using Pre-trained Language Models

Abstract:Labeling explicit discourse relations is one of the most challenging sub-tasks of the shallow discourse parsing where the goal is to identify the discourse connectives and the boundaries of their arguments. The state-of-the-art models achieve slightly above 45% of F-score by using hand-crafted features. The current paper investigates the efficacy of the pre-trained language models in this task. We find that the pre-trained language models, when finetuned, are powerful enough to replace the linguistic features. We evaluate our model on PDTB 2.0 and report the state-of-the-art results in the extraction of the full relation. This is the first time when a model outperforms the knowledge intensive models without employing any linguistic features.

* To be presented at TSD 2020

Via

Access Paper or Ask Questions

Zero-shot transfer for implicit discourse relation classification

Jul 30, 2019

Murathan Kurfalı, Robert Östling

Figure 1 for Zero-shot transfer for implicit discourse relation classification

Figure 2 for Zero-shot transfer for implicit discourse relation classification

Figure 3 for Zero-shot transfer for implicit discourse relation classification

Figure 4 for Zero-shot transfer for implicit discourse relation classification

Abstract:Automatically classifying the relation between sentences in a discourse is a challenging task, in particular when there is no overt expression of the relation. It becomes even more challenging by the fact that annotated training data exists only for a small number of languages, such as English and Chinese. We present a new system using zero-shot transfer learning for implicit discourse relation classification, where the only resource used for the target language is unannotated parallel text. This system is evaluated on the discourse-annotated TED-MDB parallel corpus, where it obtains good results for all seven languages using only English training data.

* to be presented at SIGDIAL 2019

Via

Access Paper or Ask Questions

A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation

Apr 24, 2017

Murathan Kurfalı, Ahmet Üstün, Burcu Can

Figure 1 for A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation

Figure 2 for A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation

Figure 3 for A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation

Figure 4 for A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation

Abstract:In this paper, we introduce a trie-structured Bayesian model for unsupervised morphological segmentation. We adopt prior information from different sources in the model. We use neural word embeddings to discover words that are morphologically derived from each other and thereby that are semantically similar. We use letter successor variety counts obtained from tries that are built by neural word embeddings. Our results show that using different information sources such as neural word embeddings and letter successor variety as prior information improves morphological segmentation in a Bayesian model. Our model outperforms other unsupervised morphological segmentation models on Turkish and gives promising results on English and German for scarce resources.

* 12 pages, accepted and presented at the CICLING 2017 - 18th International Conference on Intelligent Text Processing and Computational Linguistics

Via

Access Paper or Ask Questions

Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small Datasets

Mar 10, 2017

Burcu Can, Ahmet Üstün, Murathan Kurfalı

Figure 1 for Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small Datasets

Figure 2 for Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small Datasets

Figure 3 for Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small Datasets

Figure 4 for Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small Datasets

Abstract:Sparsity is one of the major problems in natural language processing. The problem becomes even more severe in agglutinating languages that are highly prone to be inflected. We deal with sparsity in Turkish by adopting morphological features for part-of-speech tagging. We learn inflectional and derivational morpheme tags in Turkish by using conditional random fields (CRF) and we employ the morpheme tags in part-of-speech (PoS) tagging by using hidden Markov models (HMMs) to mitigate sparsity. Results show that using morpheme tags in PoS tagging helps alleviate the sparsity in emission probabilities. Our model outperforms other hidden Markov model based PoS tagging models for small training datasets in Turkish. We obtain an accuracy of 94.1% in morpheme tagging and 89.2% in PoS tagging on a 5K training dataset.

* 13 pages, accepted and presented in 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING)

Via

Access Paper or Ask Questions