Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Distributed Automatic Domain-Specific Multi-Word Term Recognition Architecture using Spark Ecosystem

May 24, 2023

Ciprian-Octavian Truică, Neculai-Ovidiu Istrate, Elena-Simona Apostol

Figure 1 for A Distributed Automatic Domain-Specific Multi-Word Term Recognition Architecture using Spark Ecosystem

Figure 2 for A Distributed Automatic Domain-Specific Multi-Word Term Recognition Architecture using Spark Ecosystem

Figure 3 for A Distributed Automatic Domain-Specific Multi-Word Term Recognition Architecture using Spark Ecosystem

Figure 4 for A Distributed Automatic Domain-Specific Multi-Word Term Recognition Architecture using Spark Ecosystem

Share this with someone who'll enjoy it:

Abstract:Automatic Term Recognition is used to extract domain-specific terms that belong to a given domain. In order to be accurate, these corpus and language-dependent methods require large volumes of textual data that need to be processed to extract candidate terms that are afterward scored according to a given metric. To improve text preprocessing and candidate terms extraction and scoring, we propose a distributed Spark-based architecture to automatically extract domain-specific terms. The main contributions are as follows: (1) propose a novel distributed automatic domain-specific multi-word term recognition architecture built on top of the Spark ecosystem; (2) perform an in-depth analysis of our architecture in terms of accuracy and scalability; (3) design an easy-to-integrate Python implementation that enables the use of Big Data processing in fields such as Computational Linguistics and Natural Language Processing. We prove empirically the feasibility of our architecture by performing experiments on two real-world datasets.

View paper on

Share this with someone who'll enjoy it:

Title:A Distributed Automatic Domain-Specific Multi-Word Term Recognition Architecture using Spark Ecosystem

Paper and Code