Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

Jul 18, 2024

Gagan Bhatia, El Moatez Billah Nagoudi, Fakhraddin Alwajih, Muhammad Abdul-Mageed

Figure 1 for Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

Figure 2 for Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

Figure 3 for Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

Figure 4 for Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

Share this with someone who'll enjoy it:

Abstract:Arabic Optical Character Recognition (OCR) and Handwriting Recognition (HWR) pose unique challenges due to the cursive and context-sensitive nature of the Arabic script. This study introduces Qalam, a novel foundation model designed for Arabic OCR and HWR, built on a SwinV2 encoder and RoBERTa decoder architecture. Our model significantly outperforms existing methods, achieving a Word Error Rate (WER) of just 0.80% in HWR tasks and 1.18% in OCR tasks. We train Qalam on a diverse dataset, including over 4.5 million images from Arabic manuscripts and a synthetic dataset comprising 60k image-text pairs. Notably, Qalam demonstrates exceptional handling of Arabic diacritics, a critical feature in Arabic scripts. Furthermore, it shows a remarkable ability to process high-resolution inputs, addressing a common limitation in current OCR systems. These advancements underscore Qalam's potential as a leading solution for Arabic script recognition, offering a significant leap in accuracy and efficiency.

View paper on

Share this with someone who'll enjoy it:

Title:Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

Paper and Code