Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seyed Vahid Moravvej

A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm

May 03, 2023

Seyed Vahid Moravvej, Seyed Jalaleddin Mousavirad, Diego Oliva, Fardin Mohammadi

Figure 1 for A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm

Figure 2 for A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm

Figure 3 for A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm

Figure 4 for A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm

Abstract:Detecting plagiarism involves finding similar items in two different sources. In this article, we propose a novel method for detecting plagiarism that is based on attention mechanism-based long short-term memory (LSTM) and bidirectional encoder representations from transformers (BERT) word embedding, enhanced with optimized differential evolution (DE) method for pre-training and a focal loss function for training. BERT could be included in a downstream task and fine-tuned as a task-specific BERT can be included in a downstream task and fine-tuned as a task-specific structure, while the trained BERT model is capable of detecting various linguistic characteristics. Unbalanced classification is one of the primary issues with plagiarism detection. We suggest a focal loss-based training technique that carefully learns minority class instances to solve this. Another issue that we tackle is the training phase itself, which typically employs gradient-based methods like back-propagation for the learning process and thus suffers from some drawbacks, including sensitivity to initialization. To initiate the BP process, we suggest a novel DE algorithm that makes use of a clustering-based mutation operator. Here, a winning cluster is identified for the current DE population, and a fresh updating method is used to produce potential answers. We evaluate our proposed approach on three benchmark datasets ( MSRP, SNLI, and SemEval2014) and demonstrate that it performs well when compared to both conventional and population-based methods.

* The paper is submitted to the related journal

Via

Access Paper or Ask Questions

An LSTM-based Plagiarism Detection via Attention Mechanism and a Population-based Approach for Pre-Training Parameters with imbalanced Classes

Oct 17, 2021

Seyed Vahid Moravvej, Seyed Jalaleddin Mousavirad, Mahshid Helali Moghadam, Mehrdad Saadatmand

Figure 1 for An LSTM-based Plagiarism Detection via Attention Mechanism and a Population-based Approach for Pre-Training Parameters with imbalanced Classes

Figure 2 for An LSTM-based Plagiarism Detection via Attention Mechanism and a Population-based Approach for Pre-Training Parameters with imbalanced Classes

Figure 3 for An LSTM-based Plagiarism Detection via Attention Mechanism and a Population-based Approach for Pre-Training Parameters with imbalanced Classes

Figure 4 for An LSTM-based Plagiarism Detection via Attention Mechanism and a Population-based Approach for Pre-Training Parameters with imbalanced Classes

Abstract:Plagiarism is one of the leading problems in academic and industrial environments, which its goal is to find the similar items in a typical document or source code. This paper proposes an architecture based on a Long Short-Term Memory (LSTM) and attention mechanism called LSTM-AM-ABC boosted by a population-based approach for parameter initialization. Gradient-based optimization algorithms such as back-propagation (BP) are widely used in the literature for learning process in LSTM, attention mechanism, and feed-forward neural network, while they suffer from some problems such as getting stuck in local optima. To tackle this problem, population-based metaheuristic (PBMH) algorithms can be used. To this end, this paper employs a PBMH algorithm, artificial bee colony (ABC), to moderate the problem. Our proposed algorithm can find the initial values for model learning in all LSTM, attention mechanism, and feed-forward neural network, simultaneously. In other words, ABC algorithm finds a promising point for starting BP algorithm. For evaluation, we compare our proposed algorithm with both conventional and population-based methods. The results clearly show that the proposed method can provide competitive performance.

* 12 pages, The 28th International Conference on Neural Information Processing (ICONIP2021), BALI, Indonesia

Via

Access Paper or Ask Questions