Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Erick Mendez Guzman

LLM-CARD: Towards a Description and Landscape of Large Language Models

Sep 25, 2024

Shengwei Tian, Lifeng Han, Erick Mendez Guzman, Goran Nenadic

Abstract:With the rapid growth of the Natural Language Processing (NLP) field, a vast variety of Large Language Models (LLMs) continue to emerge for diverse NLP tasks. As an increasing number of papers are presented, researchers and developers face the challenge of information overload. Thus, it is particularly important to develop a system that can automatically extract and organise key information about LLMs from academic papers (\textbf{LLM model card}). This work is to develop such a pioneer system by using Named Entity Recognition (\textbf{NER}) and Relation Extraction (\textbf{RE}) methods that automatically extract key information about large language models from the papers, helping researchers to efficiently access information about LLMs. These features include model \textit{licence}, model \textit{name}, and model \textit{application}. With these features, we can form a model card for each paper. \textbf{Data-contribution} wise, 106 academic papers were processed by defining three dictionaries - LLMs name, licence, and application. 11,051 sentences were extracted through dictionary lookup, and the dataset was constructed through manual review of the final selection of 129 sentences that have a link between the name and the licence, and 106 sentences that have a link between the model name and the application.

* ongoing work, 16 pages

Via

Access Paper or Ask Questions

RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour

May 05, 2022

Erick Mendez Guzman, Viktor Schlegel, Riza Batista-Navarro

Figure 1 for RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour

Figure 2 for RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour

Figure 3 for RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour

Figure 4 for RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour

Abstract:Forced labour is the most common type of modern slavery, and it is increasingly gaining the attention of the research and social community. Recent studies suggest that artificial intelligence (AI) holds immense potential for augmenting anti-slavery action. However, AI tools need to be developed transparently in cooperation with different stakeholders. Such tools are contingent on the availability and access to domain-specific data, which are scarce due to the near-invisible nature of forced labour. To the best of our knowledge, this paper presents the first openly accessible English corpus annotated for multi-class and multi-label forced labour detection. The corpus consists of 989 news articles retrieved from specialised data sources and annotated according to risk indicators defined by the International Labour Organization (ILO). Each news article was annotated for two aspects: (1) indicators of forced labour as classification labels and (2) snippets of the text that justify labelling decisions. We hope that our data set can help promote research on explainability for multi-class and multi-label text classification. In this work, we explain our process for collecting the data underpinning the proposed corpus, describe our annotation guidelines and present some statistical analysis of its content. Finally, we summarise the results of baseline experiments based on different variants of the Bidirectional Encoder Representation from Transformer (BERT) model.

Via

Access Paper or Ask Questions