Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

Jan 18, 2023

Aviad Aberdam, David Bensaïd, Alona Golts, Roy Ganz, Oren Nuriel, Royee Tichauer, Shai Mazor, Ron Litman

Figure 1 for CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

Figure 2 for CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

Figure 3 for CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

Figure 4 for CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

Share this with someone who'll enjoy it:

Abstract:Understanding the scene is often essential for reading text in real-world scenarios. However, current scene text recognizers operate on cropped text images, unaware of the bigger picture. In this work, we harness the representative power of recent vision-language models, such as CLIP, to provide the crop-based recognizer with scene, image-level information. Specifically, we obtain a rich representation of the entire image and fuse it with the recognizer word-level features via cross-attention. Moreover, a gated mechanism is introduced that gradually shifts to the context-enriched representation, enabling simply fine-tuning a pretrained recognizer. We implement our model-agnostic framework, named CLIPTER - CLIP Text Recognition, on several leading text recognizers and demonstrate consistent performance gains, achieving state-of-the-art results over multiple benchmarks. Furthermore, an in-depth analysis reveals improved robustness to out-of-vocabulary words and enhanced generalization in low-data regimes.

View paper on

Share this with someone who'll enjoy it:

Title:CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

Paper and Code