Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Koren Lazar

SpeCrawler: Generating OpenAPI Specifications from API Documentation Using Large Language Models

Feb 18, 2024

Koren Lazar, Matan Vetzler, Guy Uziel, David Boaz, Esther Goldbraich, David Amid, Ateret Anaby-Tavor

Abstract:In the digital era, the widespread use of APIs is evident. However, scalable utilization of APIs poses a challenge due to structure divergence observed in online API documentation. This underscores the need for automatic tools to facilitate API consumption. A viable approach involves the conversion of documentation into an API Specification format. While previous attempts have been made using rule-based methods, these approaches encountered difficulties in generalizing across diverse documentation. In this paper we introduce SpeCrawler, a comprehensive system that utilizes large language models (LLMs) to generate OpenAPI Specifications from diverse API documentation through a carefully crafted pipeline. By creating a standardized format for numerous APIs, SpeCrawler aids in streamlining integration processes within API orchestrating systems and facilitating the incorporation of tools into LLMs. The paper explores SpeCrawler's methodology, supported by empirical evidence and case studies, demonstrating its efficacy through LLM capabilities.

* Under Review for KDD 2024

Via

Access Paper or Ask Questions

QAID: Question Answering Inspired Few-shot Intent Detection

Mar 21, 2023

Asaf Yehudai, Matan Vetzler, Yosi Mass, Koren Lazar, Doron Cohen, Boaz Carmeli

Abstract:Intent detection with semantically similar fine-grained intents is a challenging task. To address it, we reformulate intent detection as a question-answering retrieval task by treating utterances and intent names as questions and answers. To that end, we utilize a question-answering retrieval architecture and adopt a two stages training schema with batch contrastive loss. In the pre-training stage, we improve query representations through self-supervised training. Then, in the fine-tuning stage, we increase contextualized token-level similarity scores between queries and answers from the same intent. Our results on three few-shot intent detection benchmarks achieve state-of-the-art performance.

* ICLR paper

Via

Access Paper or Ask Questions

Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation

Sep 10, 2021

Shahar Levy, Koren Lazar, Gabriel Stanovsky

Figure 1 for Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation

Figure 2 for Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation

Figure 3 for Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation

Figure 4 for Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation

Abstract:Recent works have found evidence of gender bias in models of machine translation and coreference resolution using mostly synthetic diagnostic datasets. While these quantify bias in a controlled experiment, they often do so on a small scale and consist mostly of artificial, out-of-distribution sentences. In this work, we find grammatical patterns indicating stereotypical and non-stereotypical gender-role assignments (e.g., female nurses versus male dancers) in corpora from three domains, resulting in a first large-scale gender bias dataset of 108K diverse real-world English sentences. We manually verify the quality of our corpus and use it to evaluate gender bias in various coreference resolution and machine translation models. We find that all tested models tend to over-rely on gender stereotypes when presented with natural inputs, which may be especially harmful when deployed in commercial systems. Finally, we show that our dataset lends itself to finetuning a coreference resolution model, finding it mitigates bias on a held out set. Our dataset and models are publicly available at www.github.com/SLAB-NLP/BUG. We hope they will spur future research into gender bias evaluation mitigation techniques in realistic settings.

* Accepted to Findings of EMNLP 2021

Via

Access Paper or Ask Questions

Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach

Sep 09, 2021

Koren Lazar, Benny Saret, Asaf Yehudai, Wayne Horowitz, Nathan Wasserman, Gabriel Stanovsky

Figure 1 for Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach

Figure 2 for Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach

Figure 3 for Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach

Figure 4 for Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach

Abstract:We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE - 100 CE). Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text in a subjective and time-consuming process. We identify that this challenge can be formulated as a masked language modelling task, used mostly as a pretraining objective for contextualized language models. Following, we develop several architectures focusing on the Akkadian language, the lingua franca of the time. We find that despite data scarcity (1M tokens) we can achieve state of the art performance on missing tokens prediction (89% hit@5) using a greedy decoding scheme and pretraining on data from other languages and different time periods. Finally, we conduct human evaluations showing the applicability of our models in assisting experts to transcribe texts in extinct languages.

* Accepted to EMNLP 2021 (Main Conference)

Via

Access Paper or Ask Questions