Abstract:A large number of publications are available for the Optical Character Recognition (OCR). Significant researches, as well as articles are present for the Latin, Chinese and Japanese scripts. Arabic script is also one of mature script from OCR perspective. The adaptive languages which share Arabic script or its extended characters; still lacking the OCRs for their language. In this paper we present the efforts of researchers on Arabic and its related and adapted languages. This survey is organized in different sections, in which introduction is followed by properties of Sindhi Language. OCR process techniques and methods used by various researchers are presented. The last section is dedicated for future work and conclusion is also discussed.
Abstract:This paper presents a novel combinational phonetic algorithm for Sindhi Language, to be used in developing Sindhi Spell Checker which has yet not been developed prior to this work. The compound textual forms and glyphs of Sindhi language presents a substantial challenge for developing Sindhi spell checker system and generating similar suggestion list for misspelled words. In order to implement such a system, phonetic based Sindhi language rules and patterns must be considered into account for increasing the accuracy and efficiency. The proposed system is developed with a blend between Phonetic based SoundEx algorithm and ShapeEx algorithm for pattern or glyph matching, generating accurate and efficient suggestion list for incorrect or misspelled Sindhi words. A table of phonetically similar sounding Sindhi characters for SoundEx algorithm is also generated along with another table containing similar glyph or shape based character groups for ShapeEx algorithm. Both these are first ever attempt of any such type of categorization and representation for Sindhi Language.
Abstract:Statistical error Correction technique is the most accurate and widely used approach today, but for a language like Sindhi which is a low resourced language the trained corpora's are not available, so the statistical techniques are not possible at all. Instead a useful alternative would be to exploit various spelling error trends in Sindhi by using a Rule based approach. For designing such technique an essential prerequisite would be to study the various error patterns in a language. This pa per presents various studies of spelling error trends and their types in Sindhi Language. The research shows that the error trends common to all languages are also encountered in Sindhi but their do exist some error patters that are catered specifically to a Sindhi language.
Abstract:Dictionaries are essence of any language providing vital linguistic recourse for the language learners, researchers and scholars. This paper focuses on the methodology and techniques used in developing software architecture for a UBSESD (Unicode Based Sindhi to English and English to Sindhi Dictionary). The proposed system provides an accurate solution for construction and representation of Unicode based Sindhi characters in a dictionary implementing Hash Structure algorithm and a custom java Object as its internal data structure saved in a file. The System provides facilities for Insertion, Deletion and Editing of new records of Sindhi. Through this framework any type of Sindhi to English and English to Sindhi Dictionary (belonging to different domains of knowledge, e.g. engineering, medicine, computer, biology etc.) could be developed easily with accurate representation of Unicode Characters in font independent manner.
Abstract:This paper describes the design and implementation of a Unicode-based GUISL (Graphical User Interface for Sindhi Language). The idea is to provide a software platform to the people of Sindh as well as Sindhi diasporas living across the globe to make use of computing for basic tasks such as editing, composition, formatting, and printing of documents in Sindhi by using GUISL. The implementation of the GUISL has been done in the Java technology to make the system platform independent. The paper describes several design issues of Sindhi GUI in the context of existing software tools and technologies and explains how mapping and concatenation techniques have been employed to achieve the cursive shape of Sindhi script.