Abstract:With the surging inclination towards carrying out tasks on computational devices and digital mediums, any method that converts a task that was previously carried out manually, to a digitized version, is always welcome. Irrespective of the various documentation tasks that can be done online today, there are still many applications and domains where handwritten text is inevitable, which makes the digitization of handwritten documents a very essential task. Over the past decades, there has been extensive research on offline handwritten text recognition. In the recent past, most of these attempts have shifted to Machine learning and Deep learning based approaches. In order to design more complex and deeper networks, and ensure stellar performances, it is essential to have larger quantities of annotated data. Most of the databases present for offline handwritten text recognition today, have either been manually annotated or semi automatically annotated with a lot of manual involvement. These processes are very time consuming and prone to human errors. To tackle this problem, we present an innovative, complete end-to-end pipeline, that annotates offline handwritten manuscripts written in both print and cursive English, using Deep Learning and User Interaction techniques. This novel method, which involves an architectural combination of a detection system built upon a state-of-the-art text detection model, and a custom made Deep Learning model for the recognition system, is combined with an easy-to-use interactive interface, aiming to improve the accuracy of the detection, segmentation, serialization and recognition phases, in order to ensure high quality annotated data with minimal human interaction.
Abstract:We present Bharati, a simple, novel script that can represent the characters of a majority of contemporary Indian scripts. The shapes/motifs of Bharati characters are drawn from some of the simplest characters of existing Indian scripts. Bharati characters are designed such that they strictly reflect the underlying phonetic organization, thereby attributing to the script qualities of simplicity, familiarity, ease of acquisition and use. Thus, employing Bharati script as a common script for a majority of Indian languages can ameliorate several existing communication bottlenecks in India. We perform a complexity analysis of handwritten Bharati script and compare its complexity with that of 9 major Indian scripts. The measures of complexity are derived from a theory of handwritten characters based on Catastrophe theory. Bharati script is shown to be simpler than the 9 major Indian scripts in most measures of complexity.