Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tariq Rahim Soomro

An Evaluation of Sindhi Word Embedding in Semantic Analogies and Downstream Tasks

Aug 28, 2024

Wazir Ali, Saifullah Tumrani, Jay Kumar, Tariq Rahim Soomro

Figure 1 for An Evaluation of Sindhi Word Embedding in Semantic Analogies and Downstream Tasks

Figure 2 for An Evaluation of Sindhi Word Embedding in Semantic Analogies and Downstream Tasks

Figure 3 for An Evaluation of Sindhi Word Embedding in Semantic Analogies and Downstream Tasks

Figure 4 for An Evaluation of Sindhi Word Embedding in Semantic Analogies and Downstream Tasks

Abstract:In this paper, we propose a new word embedding based corpus consisting of more than 61 million words crawled from multiple web resources. We design a preprocessing pipeline for the filtration of unwanted text from crawled data. Afterwards, the cleaned vocabulary is fed to state-of-the-art continuous-bag-of-words, skip-gram, and GloVe word embedding algorithms. For the evaluation of pretrained embeddings, we use popular intrinsic and extrinsic evaluation approaches. The evaluation results reveal that continuous-bag-of-words and skip-gram perform better than GloVe and existing Sindhi fastText word embedding on both intrinsic and extrinsic evaluation approaches

* arXiv admin note: substantial text overlap with arXiv:1911.12579

Via

Access Paper or Ask Questions