Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hillel Taub-Tabib

Hierarchy Builder: Organizing Textual Spans into a Hierarchy to Facilitate Navigation

Sep 18, 2023

Itay Yair, Hillel Taub-Tabib, Yoav Goldberg

Figure 1 for Hierarchy Builder: Organizing Textual Spans into a Hierarchy to Facilitate Navigation

Figure 2 for Hierarchy Builder: Organizing Textual Spans into a Hierarchy to Facilitate Navigation

Figure 3 for Hierarchy Builder: Organizing Textual Spans into a Hierarchy to Facilitate Navigation

Figure 4 for Hierarchy Builder: Organizing Textual Spans into a Hierarchy to Facilitate Navigation

Abstract:Information extraction systems often produce hundreds to thousands of strings on a specific topic. We present a method that facilitates better consumption of these strings, in an exploratory setting in which a user wants to both get a broad overview of what's available, and a chance to dive deeper on some aspects. The system works by grouping similar items together and arranging the remaining items into a hierarchical navigable DAG structure. We apply the method to medical information extraction.

* Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Pages 282-290, Toronto, Canada, July 2023
* 9 pages including citations; Presented at the ACL 2023 DEMO track, pages 282-290

Via

Access Paper or Ask Questions

A Dataset for N-ary Relation Extraction of Drug Combinations

May 04, 2022

Aryeh Tiktinsky, Vijay Viswanathan, Danna Niezni, Dana Meron Azagury, Yosi Shamay, Hillel Taub-Tabib, Tom Hope, Yoav Goldberg

Figure 1 for A Dataset for N-ary Relation Extraction of Drug Combinations

Figure 2 for A Dataset for N-ary Relation Extraction of Drug Combinations

Figure 3 for A Dataset for N-ary Relation Extraction of Drug Combinations

Figure 4 for A Dataset for N-ary Relation Extraction of Drug Combinations

Abstract:Combination therapies have become the standard of care for diseases such as cancer, tuberculosis, malaria and HIV. However, the combinatorial set of available multi-drug treatments creates a challenge in identifying effective combination therapies available in a situation. To assist medical professionals in identifying beneficial drug-combinations, we construct an expert-annotated dataset for extracting information about the efficacy of drug combinations from the scientific literature. Beyond its practical utility, the dataset also presents a unique NLP challenge, as the first relation extraction dataset consisting of variable-length relations. Furthermore, the relations in this dataset predominantly require language understanding beyond the sentence level, adding to the challenge of this task. We provide a promising baseline model and identify clear areas for further improvement. We release our dataset, code, and baseline models publicly to encourage the NLP community to participate in this task.

* To appear in NAACL 2022

Via

Access Paper or Ask Questions

Large Scale Substitution-based Word Sense Induction

Oct 14, 2021

Matan Eyal, Shoval Sadde, Hillel Taub-Tabib, Yoav Goldberg

Figure 1 for Large Scale Substitution-based Word Sense Induction

Figure 2 for Large Scale Substitution-based Word Sense Induction

Figure 3 for Large Scale Substitution-based Word Sense Induction

Figure 4 for Large Scale Substitution-based Word Sense Induction

Abstract:We present a word-sense induction method based on pre-trained masked language models (MLMs), which can cheaply scale to large vocabularies and large corpora. The result is a corpus which is sense-tagged according to a corpus-derived sense inventory and where each sense is associated with indicative words. Evaluation on English Wikipedia that was sense-tagged using our method shows that both the induced senses, and the per-instance sense assignment, are of high quality even compared to WSD methods, such as Babelfy. Furthermore, by training a static word embeddings algorithm on the sense-tagged corpus, we obtain high-quality static senseful embeddings. These outperform existing senseful embeddings techniques on the WiC dataset and on a new outlier detection dataset we developed. The data driven nature of the algorithm allows to induce corpora-specific senses, which may not appear in standard sense inventories, as we demonstrate using a case study on the scientific domain.

Via

Access Paper or Ask Questions

Neural Extractive Search

Jun 08, 2021

Shauli Ravfogel, Hillel Taub-Tabib, Yoav Goldberg

Abstract:Domain experts often need to extract structured information from large corpora. We advocate for a search paradigm called ``extractive search'', in which a search query is enriched with capture-slots, to allow for such rapid extraction. Such an extractive search system can be built around syntactic structures, resulting in high-precision, low-recall results. We show how the recall can be improved using neural retrieval and alignment. The goals of this paper are to concisely introduce the extractive-search paradigm; and to demonstrate a prototype neural retrieval system for extractive search and its benefits and potential. Our prototype is available at \url{https://spike.neural-sim.apps.allenai.org/} and a video demonstration is available at \url{https://vimeo.com/559586687}.

* Accepted as a demo paper in ACL2021

Via

Access Paper or Ask Questions

Bootstrapping Relation Extractors using Syntactic Search by Examples

Feb 09, 2021

Matan Eyal, Asaf Amrami, Hillel Taub-Tabib, Yoav Goldberg

Figure 1 for Bootstrapping Relation Extractors using Syntactic Search by Examples

Figure 2 for Bootstrapping Relation Extractors using Syntactic Search by Examples

Figure 3 for Bootstrapping Relation Extractors using Syntactic Search by Examples

Figure 4 for Bootstrapping Relation Extractors using Syntactic Search by Examples

Abstract:The advent of neural-networks in NLP brought with it substantial improvements in supervised relation extraction. However, obtaining a sufficient quantity of training data remains a key challenge. In this work we propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs (Such as Shlain et al. (2020)) which expose a friendly by-example syntax. We use these to obtain positive examples by searching for sentences that are syntactically similar to user input examples. We apply this technique to relations from TACRED and DocRED and show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision. The models also outperform models trained using NLG data augmentation techniques. Extending the search-based approach with the NLG method further improves the results.

* EACL 2021

Via

Access Paper or Ask Questions

Interactive Extractive Search over Biomedical Corpora

Jun 07, 2020

Hillel Taub-Tabib, Micah Shlain, Shoval Sadde, Dan Lahav, Matan Eyal, Yaara Cohen, Yoav Goldberg

Figure 1 for Interactive Extractive Search over Biomedical Corpora

Figure 2 for Interactive Extractive Search over Biomedical Corpora

Figure 3 for Interactive Extractive Search over Biomedical Corpora

Abstract:We present a system that allows life-science researchers to search a linguistically annotated corpus of scientific texts using patterns over dependency graphs, as well as using patterns over token sequences and a powerful variant of boolean keyword queries. In contrast to previous attempts to dependency-based search, we introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to efficient linguistic graph-indexing and retrieval engine. This allows for rapid exploration, development and refinement of user queries. We demonstrate the system using example workflows over two corpora: the PubMed corpus including 14,446,243 PubMed abstracts and the CORD-19 dataset, a collection of over 45,000 research papers focused on COVID-19 research. The system is publicly available at https://allenai.github.io/spike

Via

Access Paper or Ask Questions

Syntactic Search by Example

Jun 04, 2020

Micah Shlain, Hillel Taub-Tabib, Shoval Sadde, Yoav Goldberg

Figure 1 for Syntactic Search by Example

Abstract:We present a system that allows a user to search a large linguistically annotated corpus using syntactic patterns over dependency graphs. In contrast to previous attempts to this effect, we introduce a light-weight query language that does not require the user to know the details of the underlying syntactic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to an efficient linguistic graph-indexing and retrieval engine. This allows for rapid exploration, development and refinement of syntax-based queries. We demonstrate the system using queries over two corpora: the English wikipedia, and a collection of English pubmed abstracts. A demo of the wikipedia system is available at: https://allenai.github.io/spike

Via

Access Paper or Ask Questions