Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DiT: Self-supervised Pre-training for Document Image Transformer

Apr 12, 2022

Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei

Figure 1 for DiT: Self-supervised Pre-training for Document Image Transformer

Figure 2 for DiT: Self-supervised Pre-training for Document Image Transformer

Figure 3 for DiT: Self-supervised Pre-training for Document Image Transformer

Figure 4 for DiT: Self-supervised Pre-training for Document Image Transformer

Share this with someone who'll enjoy it:

Abstract:Image Transformer has recently achieved significant progress for natural image understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) pre-training techniques. In this paper, we propose DiT, a self-supervised pre-trained Document Image Transformer model using large-scale unlabeled text images for Document AI tasks, which is essential since no supervised counterparts ever exist due to the lack of human labeled document images. We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR. Experiment results have illustrated that the self-supervised pre-trained DiT model achieves new state-of-the-art results on these downstream tasks, e.g. document image classification (91.11 $\rightarrow$ 92.69), document layout analysis (91.0 $\rightarrow$ 94.9), table detection (94.23 $\rightarrow$ 96.55) and text detection for OCR (93.07 $\rightarrow$ 94.29). The code and pre-trained models are publicly available at \url{https://aka.ms/msdit}.

* Work in Progress

View paper on

Share this with someone who'll enjoy it:

Title:DiT: Self-supervised Pre-training for Document Image Transformer

Paper and Code