Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Denis Coquenet

Meta-DAN: towards an efficient prediction strategy for page-level handwritten text recognition

Apr 04, 2025

Denis Coquenet

Abstract:Recent advances in text recognition led to a paradigm shift for page-level recognition, from multi-step segmentation-based approaches to end-to-end attention-based ones. However, the na\"ive character-level autoregressive decoding process results in long prediction times: it requires several seconds to process a single page image on a modern GPU. We propose the Meta Document Attention Network (Meta-DAN) as a novel decoding strategy to reduce the prediction time while enabling a better context modeling. It relies on two main components: windowed queries, to process several transformer queries altogether, enlarging the context modeling with near future; and multi-token predictions, whose goal is to predict several tokens per query instead of only the next one. We evaluate the proposed approach on 10 full-page handwritten datasets and demonstrate state-of-the-art results on average in terms of character error rate. Source code and weights of trained models are available at https://github.com/FactoDeepLearning/meta_dan.

Via

Access Paper or Ask Questions

Leveraging Vision-Language Foundation Models for Fine-Grained Downstream Tasks

Jul 13, 2023

Denis Coquenet, Clément Rambour, Emanuele Dalsasso, Nicolas Thome

Figure 1 for Leveraging Vision-Language Foundation Models for Fine-Grained Downstream Tasks

Figure 2 for Leveraging Vision-Language Foundation Models for Fine-Grained Downstream Tasks

Figure 3 for Leveraging Vision-Language Foundation Models for Fine-Grained Downstream Tasks

Figure 4 for Leveraging Vision-Language Foundation Models for Fine-Grained Downstream Tasks

Abstract:Vision-language foundation models such as CLIP have shown impressive zero-shot performance on many tasks and datasets, especially thanks to their free-text inputs. However, they struggle to handle some downstream tasks, such as fine-grained attribute detection and localization. In this paper, we propose a multitask fine-tuning strategy based on a positive/negative prompt formulation to further leverage the capacities of the vision-language foundation models. Using the CLIP architecture as baseline, we show strong improvements on bird fine-grained attribute detection and localization tasks, while also increasing the classification performance on the CUB200-2011 dataset. We provide source code for reproducibility purposes: it is available at https://github.com/FactoDeepLearning/MultitaskVLFM.

Via

Access Paper or Ask Questions

Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

Jan 25, 2023

Denis Coquenet, Clément Chatelain, Thierry Paquet

Abstract:Recent advances in handwritten text recognition enabled to recognize whole documents in an end-to-end way: the Document Attention Network (DAN) recognizes the characters one after the other through an attention-based prediction process until reaching the end of the document. However, this autoregressive process leads to inference that cannot benefit from any parallelization optimization. In this paper, we propose Faster DAN, a two-step strategy to speed up the recognition process at prediction time: the model predicts the first character of each text line in the document, and then completes all the text lines in parallel through multi-target queries and a specific document positional encoding scheme. Faster DAN reaches competitive results compared to standard DAN, while being at least 4 times faster on whole single-page and double-page images of the RIMES 2009, READ 2016 and MAURDOR datasets. Source code and trained model weights are available at https://github.com/FactoDeepLearning/FasterDAN.

Via

Access Paper or Ask Questions

Towards End-to-end Handwritten Document Recognition

Sep 30, 2022

Denis Coquenet

Figure 1 for Towards End-to-end Handwritten Document Recognition

Figure 2 for Towards End-to-end Handwritten Document Recognition

Figure 3 for Towards End-to-end Handwritten Document Recognition

Figure 4 for Towards End-to-end Handwritten Document Recognition

Abstract:Handwritten text recognition has been widely studied in the last decades for its numerous applications. Nowadays, the state-of-the-art approach consists in a three-step process. The document is segmented into text lines, which are then ordered and recognized. However, this three-step approach has many drawbacks. The three steps are treated independently whereas they are closely related. Errors accumulate from one step to the other. The ordering step is based on heuristic rules which prevent its use for documents with a complex layouts or for heterogeneous documents. The need for additional physical segmentation annotations for training the segmentation stage is inherent to this approach. In this thesis, we propose to tackle these issues by performing the handwritten text recognition of whole document in an end-to-end way. To this aim, we gradually increase the difficulty of the recognition task, moving from isolated lines to paragraphs, and then to whole documents. We proposed an approach at the line level, based on a fully convolutional network, in order to design a first generic feature extraction step for the handwriting recognition task. Based on this preliminary work, we studied two different approaches to recognize handwritten paragraphs. We reached state-of-the-art results at paragraph level on the RIMES 2011, IAM and READ 2016 datasets and outperformed the line-level state of the art on these datasets. We finally proposed the first end-to-end approach dedicated to the recognition of both text and layout, at document level. Characters and layout tokens are sequentially predicted following a learned reading order. We proposed two new metrics we used to evaluate this task on the RIMES 2009 and READ 2016 dataset, at page level and double-page level.

* Ph.D Thesis

Via

Access Paper or Ask Questions

DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Apr 07, 2022

Denis Coquenet, Clément Chatelain, Thierry Paquet

Figure 1 for DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Figure 2 for DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Figure 3 for DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Figure 4 for DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Abstract:Unconstrained handwritten text recognition is a challenging computer vision task. It is traditionally handled by a two-step approach combining line segmentation followed by text line recognition. For the first time, we propose an end-to-end segmentation-free architecture for the task of handwritten document recognition: the Document Attention Network. In addition to the text recognition, the model is trained to label text parts using begin and end tags in an XML-like fashion. This model is made up of an FCN encoder for feature extraction and a stack of transformer decoder layers for a recurrent token-by-token prediction process. It takes whole text documents as input and sequentially outputs characters, as well as logical layout tokens. Contrary to the existing segmentation-based approaches, the model is trained without using any segmentation label. We achieve competitive results on the READ 2016 dataset at page level, as well as double-page level with a CER of 3.53% and 3.69%, respectively. We also provide results for the RIMES 2009 dataset at page level, reaching 4.54% of CER. We provide all source code and pre-trained model weights at https://github.com/FactoDeepLearning/DAN.

Via

Access Paper or Ask Questions

SPAN: a Simple Predict & Align Network for Handwritten Paragraph Recognition

Feb 17, 2021

Denis Coquenet, Clément Chatelain, Thierry Paquet

Figure 1 for SPAN: a Simple Predict & Align Network for Handwritten Paragraph Recognition

Figure 2 for SPAN: a Simple Predict & Align Network for Handwritten Paragraph Recognition

Figure 3 for SPAN: a Simple Predict & Align Network for Handwritten Paragraph Recognition

Figure 4 for SPAN: a Simple Predict & Align Network for Handwritten Paragraph Recognition

Abstract:Unconstrained handwriting recognition is an essential task in document analysis. It is usually carried out in two steps. First, the document is segmented into text lines. Second, an Optical Character Recognition model is applied on these line images. We propose the Simple Predict & Align Network: an end-to-end recurrence-free Fully Convolutional Network performing OCR at paragraph level without any prior segmentation stage. The framework is as simple as the one used for the recognition of isolated lines and we achieve competitive results on three popular datasets: RIMES, IAM and READ 2016. The proposed model does not require any dataset adaptation, it can be trained from scratch, without segmentation labels, and it does not require line breaks in the transcription labels. Our code and trained model weights are available at https://github.com/FactoDeepLearning/SPAN.

Via

Access Paper or Ask Questions

Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network

Dec 09, 2020

Denis Coquenet, Clément Chatelain, Thierry Paquet

Figure 1 for Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network

Figure 2 for Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network

Figure 3 for Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network

Figure 4 for Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network

Abstract:Unconstrained handwritten text recognition is a major step in most document analysis tasks. This is generally processed by deep recurrent neural networks and more specifically with the use of Long Short-Term Memory cells. The main drawbacks of these components are the large number of parameters involved and their sequential execution during training and prediction. One alternative solution to using LSTM cells is to compensate the long time memory loss with an heavy use of convolutional layers whose operations can be executed in parallel and which imply fewer parameters. In this paper we present a Gated Fully Convolutional Network architecture that is a recurrence-free alternative to the well-known CNN+LSTM architectures. Our model is trained with the CTC loss and shows competitive results on both the RIMES and IAM datasets. We release all code to enable reproduction of our experiments: https://github.com/FactoDeepLearning/LinePytorchOCR.

* 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR)

Via

Access Paper or Ask Questions

Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition ?

Dec 09, 2020

Denis Coquenet, Yann Soullard, Clément Chatelain, Thierry Paquet

Figure 1 for Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition ?

Figure 2 for Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition ?

Figure 3 for Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition ?

Figure 4 for Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition ?

Abstract:Unconstrained handwritten text recognition remains an important challenge for deep neural networks. These last years, recurrent networks and more specifically Long Short-Term Memory networks have achieved state-of-the-art performance in this field. Nevertheless, they are made of a large number of trainable parameters and training recurrent neural networks does not support parallelism. This has a direct influence on the training time of such architectures, with also a direct consequence on the time required to explore various architectures. Recently, recurrence-free architectures such as Fully Convolutional Networks with gated mechanisms have been proposed as one possible alternative achieving competitive results. In this paper, we explore convolutional architectures and compare them to a CNN+BLSTM baseline. We propose an experimental study regarding different architectures on an offline handwriting recognition task using the RIMES dataset, and a modified version of it that consists of augmenting the images with notebook backgrounds that are printed grids.

* 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW)

Via

Access Paper or Ask Questions

End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network

Dec 07, 2020

Denis Coquenet, Clément Chatelain, Thierry Paquet

Figure 1 for End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network

Figure 2 for End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network

Figure 3 for End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network

Figure 4 for End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network

Abstract:Unconstrained handwritten text recognition remains challenging for computer vision systems. Paragraph text recognition is traditionally achieved by two models: the first one for line segmentation and the second one for text line recognition. We propose a unified end-to-end model using hybrid attention to tackle this task. We achieve state-of-the-art character error rate at line and paragraph levels on three popular datasets: 1.90% for RIMES, 4.32% for IAM and 3.63% for READ 2016. The proposed model can be trained from scratch, without using any segmentation label contrary to the standard approach. Our code and trained model weights are available at https://github.com/FactoDeepLearning/VerticalAttentionOCR.

Via

Access Paper or Ask Questions