Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mohamed Gabr

A Dynamic Programming Framework for Discovering Count and Values of Multilevel Image Thresholding

May 26, 2026

Eslam Hegazy, Mohamed Gabr

Abstract:Multilevel Image thresholding is an important preprocessing algorithm in computer vision applications nowadays. Since most common thresholding methods take the desired count of thresholds as input by the user, thresholding methods that automatically determines a suitable count of thresholds from the input image itself are advantageous. In this article, a novel thresholding method based on a dynamic programming algorithm and a modification of Minimum Error Thresholding (MET) criterion is thoroughly presented. An empirical statistical study is performed to pinpoint why this proposed method is superior. Moreover, an extended comparison between this proposed method and other state-of-the-art methods is performed on a comprehensive set of natural, satellite and medical test images. The numerical results show that the proposed MET-DP method takes much less time than traditional dynamic programming thresholding methods when the number of thresholds is high. The proposed method can detect a suitable count of thresholds for most of tested images of different types. However, traditional methods that take the count of thresholds as input produce thresholded images of higher structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) values than MET-DP. Source code can be found on https://w3id.org/met-dp/article1-code

Via

Access Paper or Ask Questions

Image Thresholding: Understanding Bias of Evaluation Metrics towards Specific Evaluation Functions

May 26, 2026

Eslam Hegazy, Mohamed Gabr

Abstract:Multilevel image thresholding is widely used for segmentation in applications ranging from medical imaging to remote sensing. Classical objective functions, such as Otsu's between-class variance and Kapur's entropy, are often optimized using metaheuristic algorithms, with performance evaluated via metrics like Structural Similarity Index (SSIM) and Peak Signal-to-Noise Ratio (PSNR). These evaluations implicitly assume that SSIM and PSNR provide unbiased measures of segmentation quality. In this study, we examine this assumption by analyzing the correlation between thresholding objective functions and quality metrics across all possible thresholds for images in the BSDS500 dataset. Results show that Otsu's criterion consistently exhibits high correlation with both SSIM and PSNR, while Kapur's entropy demonstrates weaker and more variable correlation. Otsu outperforms Kapur in correlation with PSNR for all images and with SSIM for over 91%. Our findings reveal an inherent metric-objective-function bias. This work highlights the need for more neutral evaluation frameworks and motivates extending the analysis to additional thresholding criteria and domains. Source code of this paper can be found at https://w3id.org/met-dp/icpr26-95

* Submitted to ICPR 2026 (https://icpr2026.org)

Via

Access Paper or Ask Questions

Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To Word--Definition Alignment

Oct 24, 2023

Ahmed ElBakry, Mohamed Gabr, Muhammad ElNokrashy, Badr AlKhamissi

Figure 1 for Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To Word--Definition Alignment

Figure 2 for Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To Word--Definition Alignment

Figure 3 for Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To Word--Definition Alignment

Figure 4 for Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To Word--Definition Alignment

Abstract:A Reverse Dictionary is a tool enabling users to discover a word based on its provided definition, meaning, or description. Such a technique proves valuable in various scenarios, aiding language learners who possess a description of a word without its identity, and benefiting writers seeking precise terminology. These scenarios often encapsulate what is referred to as the "Tip-of-the-Tongue" (TOT) phenomena. In this work, we present our winning solution for the Arabic Reverse Dictionary shared task. This task focuses on deriving a vector representation of an Arabic word from its accompanying description. The shared task encompasses two distinct subtasks: the first involves an Arabic definition as input, while the second employs an English definition. For the first subtask, our approach relies on an ensemble of finetuned Arabic BERT-based models, predicting the word embedding for a given definition. The final representation is obtained through averaging the output embeddings from each model within the ensemble. In contrast, the most effective solution for the second subtask involves translating the English test definitions into Arabic and applying them to the finetuned models originally trained for the first subtask. This straightforward method achieves the highest score across both subtasks.

* ArabicNLP 2023

Via

Access Paper or Ask Questions

How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation

Feb 18, 2023

Amr Hendy, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita, Young Jin Kim, Mohamed Afify, Hany Hassan Awadalla

Figure 1 for How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation

Figure 2 for How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation

Figure 3 for How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation

Figure 4 for How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation

Abstract:Generative Pre-trained Transformer (GPT) models have shown remarkable capabilities for natural language generation, but their performance for machine translation has not been thoroughly investigated. In this paper, we present a comprehensive evaluation of GPT models for machine translation, covering various aspects such as quality of different GPT models in comparison with state-of-the-art research and commercial systems, effect of prompting strategies, robustness towards domain shifts and document-level translation. We experiment with eighteen different translation directions involving high and low resource languages, as well as non English-centric translations, and evaluate the performance of three GPT models: ChatGPT, GPT3.5 (text-davinci-003), and text-davinci-002. Our results show that GPT models achieve very competitive translation quality for high resource languages, while having limited capabilities for low resource languages. We also show that hybrid approaches, which combine GPT models with other translation systems, can further enhance the translation quality. We perform comprehensive analysis and human evaluation to further understand the characteristics of GPT translations. We hope that our paper provides valuable insights for researchers and practitioners in the field and helps to better understand the potential and limitations of GPT models for translation.

Via

Access Paper or Ask Questions

Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

Mar 01, 2021

Badr AlKhamissi, Mohamed Gabr, Muhammad ElNokrashy, Khaled Essam

Figure 1 for Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

Figure 2 for Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

Figure 3 for Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

Figure 4 for Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

Abstract:In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI) shared task (Abdul-Mageed et al., 2021) and demonstrate state-of-the-art results on all of its four subtasks. Tasks are to identify the geographic origin of short Dialectal (DA) and Modern Standard Arabic (MSA) utterances at the levels of both country and province. Our final model is an ensemble of variants built on top of MARBERT that achieves an F1-score of 34.03% for DA at the country-level development set -- an improvement of 7.63% from previous work.

* This work was accepted at the Sixth Arabic Natural Language Processing Workshop (EACL/WANLP 2021)

Via

Access Paper or Ask Questions

Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization

Nov 01, 2020

Badr AlKhamissi, Muhammad N. ElNokrashy, Mohamed Gabr

Figure 1 for Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization

Figure 2 for Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization

Figure 3 for Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization

Figure 4 for Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization

Abstract:We propose a novel architecture for labelling character sequences that achieves state-of-the-art results on the Tashkeela Arabic diacritization benchmark. The core is a two-level recurrence hierarchy that operates on the word and character levels separately---enabling faster training and inference than comparable traditional models. A cross-level attention module further connects the two, and opens the door for network interpretability. The task module is a softmax classifier that enumerates valid combinations of diacritics. This architecture can be extended with a recurrent decoder that optionally accepts priors from partially diacritized text, which improves results. We employ extra tricks such as sentence dropout and majority voting to further boost the final result. Our best model achieves a WER of 5.34%, outperforming the previous state-of-the-art with a 30.56% relative error reduction.

* This work was accepted at the Fifth Arabic Natural Language Processing Workshop (COLING/WANLP 2020)

Via

Access Paper or Ask Questions