Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Martin Blum

Hidden Entity Detection from GitHub Leveraging Large Language Models

Jan 08, 2025

Lu Gan, Martin Blum, Danilo Dessi, Brigitte Mathiak, Ralf Schenkel, Stefan Dietze

Abstract:Named entity recognition is an important task when constructing knowledge bases from unstructured data sources. Whereas entity detection methods mostly rely on extensive training data, Large Language Models (LLMs) have paved the way towards approaches that rely on zero-shot learning (ZSL) or few-shot learning (FSL) by taking advantage of the capabilities LLMs acquired during pretraining. Specifically, in very specialized scenarios where large-scale training data is not available, ZSL / FSL opens new opportunities. This paper follows this recent trend and investigates the potential of leveraging Large Language Models (LLMs) in such scenarios to automatically detect datasets and software within textual content from GitHub repositories. While existing methods focused solely on named entities, this study aims to broaden the scope by incorporating resources such as repositories and online hubs where entities are also represented by URLs. The study explores different FSL prompt learning approaches to enhance the LLMs' ability to identify dataset and software mentions within repository texts. Through analyses of LLM effectiveness and learning strategies, this paper offers insights into the potential of advanced language models for automated entity detection.

* accepted by KDD2024 workshop DL4KG

Via

Access Paper or Ask Questions

SchenQL: A query language for bibliographic data with aggregations and domain-specific functions

May 13, 2022

Christin Katharina Kreutz, Martin Blum, Ralf Schenkel

Figure 1 for SchenQL: A query language for bibliographic data with aggregations and domain-specific functions

Figure 2 for SchenQL: A query language for bibliographic data with aggregations and domain-specific functions

Figure 3 for SchenQL: A query language for bibliographic data with aggregations and domain-specific functions

Figure 4 for SchenQL: A query language for bibliographic data with aggregations and domain-specific functions

Abstract:Current search interfaces of digital libraries are not suitable to satisfy complex or convoluted information needs directly, when it comes to cases such as "Find authors who only recently started working on a topic". They might offer possibilities to obtain this information only by requiring vast user interaction. We present SchenQL, a web interface of a domain-specific query language on bibliographic metadata, which offers information search and exploration by query formulation and navigation in the system. Our system focuses on supporting aggregation of data and providing specialised domain dependent functions while being suitable for domain experts as well as casual users of digital libraries.

* Accepted at JCDL'22 as a demo, 5 pages, 4 figures

Via

Access Paper or Ask Questions