Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abdel Belaïd

LORIA

Client-Driven Content Extraction Associated with Table

Apr 06, 2013

K. C. Santosh, Abdel Belaïd

Figure 1 for Client-Driven Content Extraction Associated with Table

Figure 2 for Client-Driven Content Extraction Associated with Table

Figure 3 for Client-Driven Content Extraction Associated with Table

Figure 4 for Client-Driven Content Extraction Associated with Table

Abstract:The goal of the project is to extract content within table in document images based on learnt patterns. Real-world users i.e., clients first provide a set of key fields within the table which they think are important. These are first used to represent the graph where nodes are labelled with semantics including other features and edges are attributed with relations. Attributed relational graph (ARG) is then employed to mine similar graphs from a document image. Each mined graph will represent an item within the table, and hence a set of such graphs will compose a table. We have validated the concept by using a real-world industrial problem.

* Machine Vision Applications (2013)

Via

Access Paper or Ask Questions

Handwritten and Printed Text Separation in Real Document

Mar 19, 2013

Abdel Belaïd, K. C. Santosh, Vincent Poulain D'Andecy

Figure 1 for Handwritten and Printed Text Separation in Real Document

Figure 2 for Handwritten and Printed Text Separation in Real Document

Figure 3 for Handwritten and Printed Text Separation in Real Document

Figure 4 for Handwritten and Printed Text Separation in Real Document

Abstract:The aim of the paper is to separate handwritten and printed text from a real document embedded with noise, graphics including annotations. Relying on run-length smoothing algorithm (RLSA), the extracted pseudo-lines and pseudo-words are used as basic blocks for classification. To handle this, a multi-class support vector machine (SVM) with Gaussian kernel performs a first labelling of each pseudo-word including the study of local neighbourhood. It then propagates the context between neighbours so that we can correct possible labelling errors. Considering running time complexity issue, we propose linear complexity methods where we use k-NN with constraint. When using a kd-tree, it is almost linearly proportional to the number of pseudo-words. The performance of our system is close to 90%, even when very small learning dataset where samples are basically composed of complex administrative documents.

* Machine Vision Applications (2013)

Via

Access Paper or Ask Questions