Department of Computer Science & Engineering, University of Moratuwa, Sri Lanka
Abstract:Aspect-based Sentiment Analysis (ABSA) is a critical task in Natural Language Processing (NLP) that focuses on extracting sentiments related to specific aspects within a text, offering deep insights into customer opinions. Traditional sentiment analysis methods, while useful for determining overall sentiment, often miss the implicit opinions about particular product or service features. This paper presents a comprehensive review of the evolution of ABSA methodologies, from lexicon-based approaches to machine learning and deep learning techniques. We emphasize the recent advancements in Transformer-based models, particularly Bidirectional Encoder Representations from Transformers (BERT) and its variants, which have set new benchmarks in ABSA tasks. We focused on finetuning Llama and Mistral models, building hybrid models using the SetFit framework, and developing our own model by exploiting the strengths of state-of-the-art (SOTA) Transformer-based models for aspect term extraction (ATE) and aspect sentiment classification (ASC). Our hybrid model Instruct - DeBERTa uses SOTA InstructABSA for aspect extraction and DeBERTa-V3-baseabsa-V1 for aspect sentiment classification. We utilize datasets from different domains to evaluate our model's performance. Our experiments indicate that the proposed hybrid model significantly improves the accuracy and reliability of sentiment analysis across all experimented domains. As per our findings, our hybrid model Instruct - DeBERTa is the best-performing model for the joint task of ATE and ASC for both SemEval restaurant 2014 and SemEval laptop 2014 datasets separately. By addressing the limitations of existing methodologies, our approach provides a robust solution for understanding detailed consumer feedback, thus offering valuable insights for businesses aiming to enhance customer satisfaction and product development.
Abstract:In the rapidly evolving digital era, there is an increasing demand for concise information as individuals seek to distil key insights from various sources. Recent attention from researchers on Multi-document Summarisation (MDS) has resulted in diverse datasets covering customer reviews, academic papers, medical and legal documents, and news articles. However, the English-centric nature of these datasets has created a conspicuous void for multilingual datasets in today's globalised digital landscape, where linguistic diversity is celebrated. Media platforms such as British Broadcasting Corporation (BBC) have disseminated news in 20+ languages for decades. With only 380 million people speaking English natively as their first language, accounting for less than 5% of the global population, the vast majority primarily relies on other languages. These facts underscore the need for inclusivity in MDS research, utilising resources from diverse languages. Recognising this gap, we present the Multilingual Dataset for Multi-document Summarisation (M2DS), which, to the best of our knowledge, is the first dataset of its kind. It includes document-summary pairs in five languages from BBC articles published during the 2010-2023 period. This paper introduces M2DS, emphasising its unique multilingual aspect, and includes baseline scores from state-of-the-art MDS models evaluated on our dataset.
Abstract:Since the dawn of the digitalisation era, customer feedback and online reviews are unequivocally major sources of insights for businesses. Consequently, conducting comparative analyses of such sources has become the de facto modus operandi of any business that wishes to give itself a competitive edge over its peers and improve customer loyalty. Sentiment analysis is one such method instrumental in gauging public interest, exposing market trends, and analysing competitors. While traditional sentiment analysis focuses on overall sentiment, as the needs advance with time, it has become important to explore public opinions and sentiments on various specific subjects, products and services mentioned in the reviews on a finer-granular level. To this end, Aspect-based Sentiment Analysis (ABSA), supported by advances in Artificial Intelligence (AI) techniques which have contributed to a paradigm shift from simple word-level analysis to tone and context-aware analyses, focuses on identifying specific aspects within the text and determining the sentiment associated with each aspect. In this study, we compare several deep-NN methods for ABSA on two benchmark datasets (Restaurant14 and Laptop-14) and found that FAST LSA obtains the best overall results of 87.6% and 82.6% accuracy but does not pass LSA+DeBERTa which reports 90.33% and 86.21% accuracy respectively.
Abstract:Manual data annotation is an important NLP task but one that takes considerable amount of resources and effort. In spite of the costs, labeling and categorizing entities is essential for NLP tasks such as semantic evaluation. Even though annotation can be done by non-experts in most cases, due to the fact that this requires human labor, the process is costly. Another major challenge encountered in data annotation is maintaining the annotation consistency. Annotation efforts are typically carried out by teams of multiple annotators. The annotations need to maintain the consistency in relation to both the domain truth and annotation format while reducing human errors. Annotating a specialized domain that deviates significantly from the general domain, such as fantasy literature, will see a lot of human error and annotator disagreement. So it is vital that proper guidelines and error reduction mechanisms are enforced. One such way to enforce these constraints is using a specialized application. Such an app can ensure that the notations are consistent, and the labels can be pre-defined or restricted reducing the room for errors. In this paper, we present SHADE, an annotation software that can be used to annotate entities in the high fantasy literature domain. Specifically in Dungeons and Dragons lore extracted from the Forgotten Realms Fandom Wiki.
Abstract:We analysed a sample of NLP research papers archived in ACL Anthology as an attempt to quantify the degree of openness and the benefit of such an open culture in the NLP community. We observe that papers published in different NLP venues show different patterns related to artefact reuse. We also note that more than 30% of the papers we analysed do not release their artefacts publicly, despite promising to do so. Further, we observe a wide language-wise disparity in publicly available NLP-related artefacts.
Abstract:Named Entity Recognition (NER) is a sequence classification Natural Language Processing task where entities are identified in the text and classified into predefined categories. It acts as a foundation for most information extraction systems. Dungeons and Dragons (D&D) is an open-ended tabletop fantasy game with its own diverse lore. DnD entities are domain-specific and are thus unrecognizable by even the state-of-the-art off-the-shelf NER systems as the NER systems are trained on general data for pre-defined categories such as: person (PERS), location (LOC), organization (ORG), and miscellaneous (MISC). For meaningful extraction of information from fantasy text, the entities need to be classified into domain-specific entity categories as well as the models be fine-tuned on a domain-relevant corpus. This work uses available lore of monsters in the D&D domain to fine-tune Trankit, which is a prolific NER framework that uses a pre-trained model for NER. Upon this training, the system acquires the ability to extract monster names from relevant domain documents under a novel NER tag. This work compares the accuracy of the monster name identification against; the zero-shot Trankit model and two FLAIR models. The fine-tuned Trankit model achieves an 87.86% F1 score surpassing all the other considered models.
Abstract:We conducted a detailed analysis on the quality of web-mined corpora for two low-resource languages (making three language pairs, English-Sinhala, English-Tamil and Sinhala-Tamil). We ranked each corpus according to a similarity measure and carried out an intrinsic and extrinsic evaluation on different portions of this ranked corpus. We show that there are significant quality differences between different portions of web-mined corpora and that the quality varies across languages and datasets. We also show that, for some web-mined datasets, Neural Machine Translation (NMT) models trained with their highest-ranked 25k portion can be on par with human-curated datasets.
Abstract:Since their inception, embeddings have become a primary ingredient in many flavours of Natural Language Processing (NLP) tasks supplanting earlier types of representation. Even though multilingual embeddings have been used for the increasing number of multilingual tasks, due to the scarcity of parallel training data, low-resource languages such as Sinhala, tend to focus more on monolingual embeddings. Then when it comes to the aforementioned multi-lingual tasks, it is challenging to utilize these monolingual embeddings given that even if the embedding spaces have a similar geometric arrangement due to an identical training process, the embeddings of the languages considered are not aligned. This is solved by the embedding alignment task. Even in this, high-resource language pairs are in the limelight while low-resource languages such as Sinhala which is in dire need of help seem to have fallen by the wayside. In this paper, we try to align Sinhala and English word embedding spaces based on available alignment techniques and introduce a benchmark for Sinhala language embedding alignment. In addition to that, to facilitate the supervised alignment, as an intermediate task, we also introduce Sinhala-English alignment datasets. These datasets serve as our anchor datasets for supervised word embedding alignment. Even though we do not obtain results comparable to the high-resource languages such as French, German, or Chinese, we believe our work lays the groundwork for more specialized alignment between English and Sinhala embeddings.
Abstract:Many NLP tasks, although well-resolved for general English, face challenges in specific domains like fantasy literature. This is evident in Named Entity Recognition (NER), which detects and categorizes entities in text. We analyzed 10 NER models on 7 Dungeons and Dragons (D&D) adventure books to assess domain-specific performance. Using open-source Large Language Models, we annotated named entities in these books and evaluated each model's precision. Our findings indicate that, without modifications, Flair, Trankit, and Spacy outperform others in identifying named entities in the D&D context.
Abstract:This paper is aimed at evaluating state-of-the-art models for Multi-document Summarization (MDS) on different types of datasets in various domains and investigating the limitations of existing models to determine future research directions. To address this gap, we conducted an extensive literature review to identify state-of-the-art models and datasets. We analyzed the performance of PRIMERA and PEGASUS models on BigSurvey-MDS and MS$^2$ datasets, which posed unique challenges due to their varied domains. Our findings show that the General-Purpose Pre-trained Model LED outperforms PRIMERA and PEGASUS on the MS$^2$ dataset. We used the ROUGE score as a performance metric to evaluate the identified models on different datasets. Our study provides valuable insights into the models' strengths and weaknesses, as well as their applicability in different domains. This work serves as a reference for future MDS research and contributes to the development of accurate and robust models which can be utilized on demanding datasets with academically and/or scientifically complex data as well as generalized, relatively simple datasets.