Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ronaldo Messina

Manifold Mixup improves text recognition with CTC loss

Mar 11, 2019

Bastien Moysset, Ronaldo Messina

Figure 1 for Manifold Mixup improves text recognition with CTC loss

Figure 2 for Manifold Mixup improves text recognition with CTC loss

Figure 3 for Manifold Mixup improves text recognition with CTC loss

Figure 4 for Manifold Mixup improves text recognition with CTC loss

Abstract:Modern handwritten text recognition techniques employ deep recurrent neural networks. The use of these techniques is especially efficient when a large amount of annotated data is available for parameter estimation. Data augmentation can be used to enhance the performance of the systems when data is scarce. Manifold Mixup is a modern method of data augmentation that meld two images or the feature maps corresponding to these images and the targets are fused accordingly. We propose to apply the Manifold Mixup to text recognition while adapting it to work with a Connectionist Temporal Classification cost. We show that Manifold Mixup improves text recognition results on various languages and datasets.

Via

Access Paper or Ask Questions

Adversarial Generation of Handwritten Text Images Conditioned on Sequences

Mar 01, 2019

Eloi Alonso, Bastien Moysset, Ronaldo Messina

Figure 1 for Adversarial Generation of Handwritten Text Images Conditioned on Sequences

Figure 2 for Adversarial Generation of Handwritten Text Images Conditioned on Sequences

Figure 3 for Adversarial Generation of Handwritten Text Images Conditioned on Sequences

Figure 4 for Adversarial Generation of Handwritten Text Images Conditioned on Sequences

Abstract:State-of-the-art offline handwriting text recognition systems tend to use neural networks and therefore require a large amount of annotated data to be trained. In order to partially satisfy this requirement, we propose a system based on Generative Adversarial Networks (GAN) to produce synthetic images of handwritten words. We use bidirectional LSTM recurrent layers to get an embedding of the word to be rendered, and we feed it to the generator network. We also modify the standard GAN by adding an auxiliary network for text recognition. The system is then trained with a balanced combination of an adversarial loss and a CTC loss. Together, these extensions to GAN enable to control the textual content of the generated word images. We obtain realistic images on both French and Arabic datasets, and we show that integrating these synthetic images into the existing training data of a text recognition system can slightly enhance its performance.

Via

Access Paper or Ask Questions

Are 2D-LSTM really dead for offline text recognition?

Nov 27, 2018

Bastien Moysset, Ronaldo Messina

Figure 1 for Are 2D-LSTM really dead for offline text recognition?

Figure 2 for Are 2D-LSTM really dead for offline text recognition?

Figure 3 for Are 2D-LSTM really dead for offline text recognition?

Figure 4 for Are 2D-LSTM really dead for offline text recognition?

Abstract:There is a recent trend in handwritten text recognition with deep neural networks to replace 2D recurrent layers with 1D, and in some cases even completely remove the recurrent layers, relying on simple feed-forward convolutional only architectures. The most used type of recurrent layer is the Long-Short Term Memory (LSTM). The motivations to do so are many: there are few open-source implementations of 2D-LSTM, even fewer supporting GPU implementations (currently cuDNN only implements 1D-LSTM); 2D recurrences reduce the amount of computations that can be parallelized, and thus possibly increase the training/inference time; recurrences create global dependencies with respect to the input, and sometimes this may not be desirable. Many recent competitions were won by systems that employed networks that use 2D-LSTM layers. Most previous work that compared 1D or pure feed-forward architectures to 2D recurrent models have done so on simple datasets or did not fully optimize the "baseline" 2D model compared to the challenger model, which was dully optimized. In this work, we aim at a fair comparison between 2D and competing models and also extensively evaluate them on more complex datasets that are more representative of challenging "real-world" data, compared to "academic" datasets that are more restricted in their complexity. We aim at determining when and why the 1D and 2D recurrent models have different results. We also compare the results with a language model to assess if linguistic constraints do level the performance of the different networks. Our results show that for challenging datasets, 2D-LSTM networks still seem to provide the highest performances and we propose a visualization strategy to explain it.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention

Aug 23, 2016

Théodore Bluche, Jérôme Louradour, Ronaldo Messina

Figure 1 for Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention

Figure 2 for Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention

Figure 3 for Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention

Figure 4 for Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention

Abstract:We present an attention-based model for end-to-end handwriting recognition. Our system does not require any segmentation of the input paragraph. The model is inspired by the differentiable attention models presented recently for speech recognition, image captioning or translation. The main difference is the covert and overt attention, implemented as a multi-dimensional LSTM network. Our principal contribution towards handwriting recognition lies in the automatic transcription without a prior segmentation into lines, which was crucial in previous approaches. To the best of our knowledge this is the first successful attempt of end-to-end multi-line handwriting recognition. We carried out experiments on the well-known IAM Database. The results are encouraging and bring hope to perform full paragraph transcription in the near future.

Via

Access Paper or Ask Questions