Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Md. Abdur Rahman

A Novel Word Pair-based Gaussian Sentence Similarity Algorithm For Bengali Extractive Text Summarization

Nov 26, 2024

Fahim Morshed, Md. Abdur Rahman, Sumon Ahmed

Figure 1 for A Novel Word Pair-based Gaussian Sentence Similarity Algorithm For Bengali Extractive Text Summarization

Figure 2 for A Novel Word Pair-based Gaussian Sentence Similarity Algorithm For Bengali Extractive Text Summarization

Figure 3 for A Novel Word Pair-based Gaussian Sentence Similarity Algorithm For Bengali Extractive Text Summarization

Figure 4 for A Novel Word Pair-based Gaussian Sentence Similarity Algorithm For Bengali Extractive Text Summarization

Abstract:Extractive Text Summarization is the process of selecting the most representative parts of a larger text without losing any key information. Recent attempts at extractive text summarization in Bengali, either relied on statistical techniques like TF-IDF or used naive sentence similarity measures like the word averaging technique. All of these strategies suffer from expressing semantic relationships correctly. Here, we propose a novel Word pair-based Gaussian Sentence Similarity (WGSS) algorithm for calculating the semantic relation between two sentences. WGSS takes the geometric means of individual Gaussian similarity values of word embedding vectors to get the semantic relationship between sentences. It compares two sentences on a word-to-word basis which rectifies the sentence representation problem faced by the word averaging method. The summarization process extracts key sentences by grouping semantically similar sentences into clusters using the Spectral Clustering algorithm. After clustering, we use TF-IDF ranking to pick the best sentence from each cluster. The proposed method is validated using four different datasets, and it outperformed other recent models by 43.2\% on average ROUGE scores (ranging from 2.5\% to 95.4\%). It is also experimented on other low-resource languages i.e. Turkish, Marathi, and Hindi language, where we find that the proposed method performs as similar as Bengali for these languages. In addition, a new high-quality Bengali dataset is curated which contains 250 articles and a pair of summaries for each of them. We believe this research is a crucial addition to Bengali Natural Language Processing (NLP) research and it can easily be extended into other low-resource languages. We made the implementation of the proposed model and data public on \href{https://github.com/FMOpee/WGSS}{https://github.com/FMOpee/WGSS}.

* Submitted to ACM Transaction on Asian and Low-resource Language Information Processing

Via

Access Paper or Ask Questions

Performance Analysis of Deep Autoencoder and NCA Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers

Dec 24, 2019

Md. Abu Bakr Siddique, Shadman Sakib, Md. Abdur Rahman

Figure 1 for Performance Analysis of Deep Autoencoder and NCA Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers

Figure 2 for Performance Analysis of Deep Autoencoder and NCA Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers

Figure 3 for Performance Analysis of Deep Autoencoder and NCA Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers

Figure 4 for Performance Analysis of Deep Autoencoder and NCA Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers

Abstract:The central aim of this paper is to implement Deep Autoencoder and Neighborhood Components Analysis (NCA) dimensionality reduction methods in Matlab and to observe the application of these algorithms on nine unlike datasets from UCI machine learning repository. These datasets are CNAE9, Movement Libras, Pima Indians diabetes, Parkinsons, Knowledge, Segmentation, Seeds, Mammographic Masses, and Ionosphere. First of all, the dimension of these datasets has been reduced to fifty percent of their original dimension by selecting and extracting the most relevant and appropriate features or attributes using Deep Autoencoder and NCA dimensionality reduction techniques. Afterward, each dataset is classified applying K-Nearest Neighbors (KNN), Extended Nearest Neighbors (ENN) and Support Vector Machine (SVM) classification algorithms. All classification algorithms are developed in the Matlab environment. In each classification, the training test data ratio is always set to ninety percent: ten percent. Upon classification, variation between accuracies is observed and analyzed to find the degree of compatibility of each dimensionality reduction technique with each classifier and to evaluate each classifier performance on each dataset.

* 2nd International Conference on Innovation in Engineering and Technology (ICIET)

Via

Access Paper or Ask Questions

Modeling Spammer Behavior: Naïve Bayes vs. Artificial Neural Networks

Aug 19, 2010

Md. Saiful Islam, Shah Mostafa Khaled, Khalid Farhan, Md. Abdur Rahman, Joy Rahman

Figure 1 for Modeling Spammer Behavior: Naïve Bayes vs. Artificial Neural Networks

Figure 2 for Modeling Spammer Behavior: Naïve Bayes vs. Artificial Neural Networks

Abstract:Addressing the problem of spam emails in the Internet, this paper presents a comparative study on Na\"ive Bayes and Artificial Neural Networks (ANN) based modeling of spammer behavior. Keyword-based spam email filtering techniques fall short to model spammer behavior as the spammer constantly changes tactics to circumvent these filters. The evasive tactics that the spammer uses are themselves patterns that can be modeled to combat spam. It has been observed that both Na\"ive Bayes and ANN are best suitable for modeling spammer common patterns. Experimental results demonstrate that both of them achieve a promising detection rate of around 92%, which is considerably an improvement of performance compared to the keyword-based contemporary filtering approaches.

* Proc. of IEEE ICIMT, Jeju Island, South Korea, December 16-18, 2009, pp. 52-55
* 4 pages, 1 figure, 3 tables

Via

Access Paper or Ask Questions