Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kemal Kurniawan

Training and Evaluating with Human Label Variation: An Empirical Study

Feb 03, 2025

Kemal Kurniawan, Meladel Mistica, Timothy Baldwin, Jey Han Lau

Abstract:Human label variation (HLV) challenges the standard assumption that an example has a single ground truth, instead embracing the natural variation in human labelling to train and evaluate models. While various training methods and metrics for HLV have been proposed, there has been no systematic meta-evaluation of HLV evaluation metrics, contributing to the lack of clarity in the best HLV training method. We propose new evaluation metrics and training methods and empirically meta-evaluate HLV evaluation metrics. We find that training on either disaggregated annotations or soft labels often performs best across metrics, and that our proposed soft metric correlates best with human preference.

Via

Access Paper or Ask Questions

MoDEM: Mixture of Domain Expert Models

Oct 09, 2024

Toby Simonds, Kemal Kurniawan, Jey Han Lau

Figure 1 for MoDEM: Mixture of Domain Expert Models

Figure 2 for MoDEM: Mixture of Domain Expert Models

Figure 3 for MoDEM: Mixture of Domain Expert Models

Figure 4 for MoDEM: Mixture of Domain Expert Models

Abstract:We propose a novel approach to enhancing the performance and efficiency of large language models (LLMs) by combining domain prompt routing with domain-specialized models. We introduce a system that utilizes a BERT-based router to direct incoming prompts to the most appropriate domain expert model. These expert models are specifically tuned for domains such as health, mathematics and science. Our research demonstrates that this approach can significantly outperform general-purpose models of comparable size, leading to a superior performance-to-cost ratio across various benchmarks. The implications of this study suggest a potential paradigm shift in LLM development and deployment. Rather than focusing solely on creating increasingly large, general-purpose models, the future of AI may lie in developing ecosystems of smaller, highly specialized models coupled with sophisticated routing systems. This approach could lead to more efficient resource utilization, reduced computational costs, and superior overall performance.

Via

Access Paper or Ask Questions

Generating bilingual example sentences with large language models as lexicography assistants

Oct 04, 2024

Raphael Merx, Ekaterina Vylomova, Kemal Kurniawan

Figure 1 for Generating bilingual example sentences with large language models as lexicography assistants

Figure 2 for Generating bilingual example sentences with large language models as lexicography assistants

Figure 3 for Generating bilingual example sentences with large language models as lexicography assistants

Figure 4 for Generating bilingual example sentences with large language models as lexicography assistants

Abstract:We present a study of LLMs' performance in generating and rating example sentences for bilingual dictionaries across languages with varying resource levels: French (high-resource), Indonesian (mid-resource), and Tetun (low-resource), with English as the target language. We evaluate the quality of LLM-generated examples against the GDEX (Good Dictionary EXample) criteria: typicality, informativeness, and intelligibility. Our findings reveal that while LLMs can generate reasonably good dictionary examples, their performance degrades significantly for lower-resourced languages. We also observe high variability in human preferences for example quality, reflected in low inter-annotator agreement rates. To address this, we demonstrate that in-context learning can successfully align LLMs with individual annotator preferences. Additionally, we explore the use of pre-trained language models for automated rating of examples, finding that sentence perplexity serves as a good proxy for typicality and intelligibility in higher-resourced languages. Our study also contributes a novel dataset of 600 ratings for LLM-generated sentence pairs, and provides insights into the potential of LLMs in reducing the cost of lexicographic work, particularly for low-resource languages.

Via

Access Paper or Ask Questions

To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction

Aug 05, 2024

Kemal Kurniawan, Meladel Mistica, Timothy Baldwin, Jey Han Lau

Abstract:This paper explores the task of automatic prediction of text spans in a legal problem description that support a legal area label. We use a corpus of problem descriptions written by laypeople in English that is annotated by practising lawyers. Inherent subjectivity exists in our task because legal area categorisation is a complex task, and lawyers often have different views on a problem, especially in the face of legally-imprecise descriptions of issues. Experiments show that training on majority-voted spans outperforms training on disaggregated ones.

* Accepted at WASSA 2024

Via

Access Paper or Ask Questions

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

May 31, 2022

Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung(+4 more)

Figure 1 for NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

Figure 2 for NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

Figure 3 for NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

Figure 4 for NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

Abstract:Natural language processing (NLP) has a significant impact on society via technologies such as machine translation and search engines. Despite its success, NLP technology is only widely available for high-resource languages such as English and Chinese, while it remains inaccessible to many languages due to the unavailability of data resources and benchmarks. In this work, we focus on developing resources for languages in Indonesia. Despite being the second most linguistically diverse country, most languages in Indonesia are categorized as endangered and some are even extinct. We develop the first-ever parallel resource for 10 low-resource languages in Indonesia. Our resource includes datasets, a multi-task benchmark, and lexicons, as well as a parallel Indonesian-English dataset. We provide extensive analyses and describe the challenges when creating such resources. We hope that our work can spark NLP research on Indonesian and other underrepresented languages.

* Preprint

Via

Access Paper or Ask Questions

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

Mar 24, 2022

Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin(+2 more)

Figure 1 for One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

Figure 2 for One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

Figure 3 for One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

Figure 4 for One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

Abstract:NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects. Focusing on the languages spoken in Indonesia, the second most linguistically diverse and the fourth most populous nation of the world, we provide an overview of the current state of NLP research for Indonesia's 700+ languages. We highlight challenges in Indonesian NLP and how these affect the performance of current NLP systems. Finally, we provide general recommendations to help develop NLP technology not only for languages of Indonesia but also other underrepresented languages.

* Accepted in ACL 2022

Via

Access Paper or Ask Questions

Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data

Oct 08, 2021

Kemal Kurniawan, Lea Frermann, Philip Schulz, Trevor Cohn

Figure 1 for Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data

Figure 2 for Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data

Figure 3 for Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data

Figure 4 for Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data

Abstract:Providing technologies to communities or domains where training data is scarce or protected e.g., for privacy reasons, is becoming increasingly important. To that end, we generalise methods for unsupervised transfer from multiple input models for structured prediction. We show that the means of aggregating over the input models is critical, and that multiplying marginal probabilities of substructures to obtain high-probability structures for distant supervision is substantially better than taking the union of such structures over the input models, as done in prior work. Testing on 18 languages, we demonstrate that the method works in a cross-lingual setting, considering both dependency parsing and part-of-speech structured prediction problems. Our analyses show that the proposed method produces less noisy labels for the distant supervision.

Via

Access Paper or Ask Questions

PPT: Parsimonious Parser Transfer for Unsupervised Cross-Lingual Adaptation

Jan 27, 2021

Kemal Kurniawan, Lea Frermann, Philip Schulz, Trevor Cohn

Figure 1 for PPT: Parsimonious Parser Transfer for Unsupervised Cross-Lingual Adaptation

Figure 2 for PPT: Parsimonious Parser Transfer for Unsupervised Cross-Lingual Adaptation

Figure 3 for PPT: Parsimonious Parser Transfer for Unsupervised Cross-Lingual Adaptation

Figure 4 for PPT: Parsimonious Parser Transfer for Unsupervised Cross-Lingual Adaptation

Abstract:Cross-lingual transfer is a leading technique for parsing low-resource languages in the absence of explicit supervision. Simple `direct transfer' of a learned model based on a multilingual input encoding has provided a strong benchmark. This paper presents a method for unsupervised cross-lingual transfer that improves over direct transfer systems by using their output as implicit supervision as part of self-training on unlabelled text in the target language. The method assumes minimal resources and provides maximal flexibility by (a) accepting any pre-trained arc-factored dependency parser; (b) assuming no access to source language data; (c) supporting both projective and non-projective parsing; and (d) supporting multi-source transfer. With English as the source language, we show significant improvements over state-of-the-art transfer models on both distant and nearby languages, despite our conceptually simpler approach. We provide analyses of the choice of source languages for multi-source transfer, and the advantage of non-projective parsing. Our code is available online.

* Accepted at EACL 2021

Via

Access Paper or Ask Questions

KaWAT: A Word Analogy Task Dataset for Indonesian

Jun 17, 2019

Kemal Kurniawan

Abstract:We introduced KaWAT (Kata Word Analogy Task), a new word analogy task dataset for Indonesian. We evaluated on it several existing pretrained Indonesian word embeddings and embeddings trained on Indonesian online news corpus. We also tested them on two downstream tasks and found that pretrained word embeddings helped either by reducing the training epochs or yielding significant performance gains.

* Extended abstract

Via

Access Paper or Ask Questions

IndoSum: A New Benchmark Dataset for Indonesian Text Summarization

Oct 12, 2018

Kemal Kurniawan, Samuel Louvan

Figure 1 for IndoSum: A New Benchmark Dataset for Indonesian Text Summarization

Figure 2 for IndoSum: A New Benchmark Dataset for Indonesian Text Summarization

Figure 3 for IndoSum: A New Benchmark Dataset for Indonesian Text Summarization

Abstract:Automatic text summarization is generally considered as a challenging task in the NLP community. One of the challenges is the publicly available and large dataset that is relatively rare and difficult to construct. The problem is even worse for low-resource languages such as Indonesian. In this paper, we present IndoSum, a new benchmark dataset for Indonesian text summarization. The dataset consists of news articles and manually constructed summaries. Notably, the dataset is almost 200x larger than the previous Indonesian summarization dataset of the same domain. We evaluated various extractive summarization approaches and obtained encouraging results which demonstrate the usefulness of the dataset and provide baselines for future research. The code and the dataset are available online under permissive licenses.

* Accepted in IALP 2018

Via

Access Paper or Ask Questions