Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Amarnath R

Word and character segmentation directly in run-length compressed handwritten document images

Aug 18, 2019

Amarnath R, P. Nagabhushan, Mohammed Javed

Figure 1 for Word and character segmentation directly in run-length compressed handwritten document images

Figure 2 for Word and character segmentation directly in run-length compressed handwritten document images

Figure 3 for Word and character segmentation directly in run-length compressed handwritten document images

Figure 4 for Word and character segmentation directly in run-length compressed handwritten document images

Abstract:From the literature, it is demonstrated that performing text-line segmentation directly in the run-length compressed handwritten document images significantly reduces the computational time and memory space. In this paper, we investigate the issues of word and character segmentation directly on the run-length compressed document images. Primarily, the spreads of the characters are intelligently extracted from the foreground runs of the compressed data and subsequently connected components are established. The spacing between the connected components would be larger between the adjacent words when compared to that of intra-words. With this knowledge, a threshold is empirically chosen for inter-word separation. Every connected component within a word is further analysed for character segmentation. Here, min-cut graph concept is used for separating the touching characters. Over-segmentation and under-segmentation issues are addressed by insertion and deletion operations respectively. The approach has been developed particularly for compressed handwritten English document images. However, the model has been tested on non-English document images.

* 17 pages,19 figures

Via

Access Paper or Ask Questions

Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation

Aug 18, 2017

Amarnath R, P. Nagabhushan

Figure 1 for Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation

Figure 2 for Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation

Figure 3 for Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation

Figure 4 for Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation

Abstract:Line separators are used to segregate text-lines from one another in document image analysis. Finding the separator points at every line terminal in a document image would enable text-line segmentation. In particular, identifying the separators in handwritten text could be a thrilling exercise. Obviously it would be challenging to perform this in the compressed version of a document image and that is the proposed objective in this research. Such an effort would prevent the computational burden of decompressing a document for text-line segmentation. Since document images are generally compressed using run length encoding (RLE) technique as per the CCITT standards, the first column in the RLE will be a white column. The value (depth) in the white column is very low when a particular line is a text line and the depth could be larger at the point of text line separation. A longer consecutive sequence of such larger depth should indicate the gap between the text lines, which provides the separator region. In case of over separation and under separation issues, corrective actions such as deletion and insertion are suggested respectively. An extensive experimentation is conducted on the compressed images of the benchmark datasets of ICDAR13 and Alireza et al [17] to demonstrate the efficacy.

* International Journal of Computer Applications 172(4): 40-47 (2017)
* Line separators, Document image analysis, Handwritten text, Compression and decompression, RLE, CCITT. Line separator points at every line terminal in a compressed handwritten document images enabling text line segmentation

Via

Access Paper or Ask Questions