Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

Dec 06, 2023

Risab Biswas, Swalpa Kumar Roy, Ning Wang, Umapada Pal, Guang-Bin Huang

Figure 1 for DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

Figure 2 for DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

Figure 3 for DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

Figure 4 for DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

Share this with someone who'll enjoy it:

Abstract:In real life, various degradation scenarios exist that might damage document images, making it harder to recognize and analyze them, thus binarization is a fundamental and crucial step for achieving the most optimal performance in any document analysis task. We propose DocBinFormer (Document Binarization Transformer), a novel two-level vision transformer (TL-ViT) architecture based on vision transformers for effective document image binarization. The presented architecture employs a two-level transformer encoder to effectively capture both global and local feature representation from the input images. These complimentary bi-level features are exploited for efficient document image binarization, resulting in improved results for system-generated as well as handwritten document images in a comprehensive approach. With the absence of convolutional layers, the transformer encoder uses the pixel patches and sub-patches along with their positional information to operate directly on them, while the decoder generates a clean (binarized) output image from the latent representation of the patches. Instead of using a simple vision transformer block to extract information from the image patches, the proposed architecture uses two transformer blocks for greater coverage of the extracted feature space on a global and local scale. The encoded feature representation is used by the decoder block to generate the corresponding binarized output. Extensive experiments on a variety of DIBCO and H-DIBCO benchmarks show that the proposed model outperforms state-of-the-art techniques on four metrics. The source code will be made available at https://github.com/RisabBiswas/DocBinFormer.

View paper on

Share this with someone who'll enjoy it:

Title:DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

Paper and Code