Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anastasiia Fadeeva

InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding

Mar 29, 2025

Anastasiia Fadeeva, Vincent Coriou, Diego Antognini, Claudiu Musat, Andrii Maksai

Figure 1 for InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding

Figure 2 for InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding

Figure 3 for InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding

Figure 4 for InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding

Abstract:Tablets and styluses are increasingly popular for taking notes. To optimize this experience and ensure a smooth and efficient workflow, it's important to develop methods for accurately interpreting and understanding the content of handwritten digital notes. We introduce a foundational model called InkFM for analyzing full pages of handwritten content. Trained on a diverse mixture of tasks, this model offers a unique combination of capabilities: recognizing text in 28 different scripts, mathematical expressions recognition, and segmenting pages into distinct elements like text and drawings. Our results demonstrate that these tasks can be effectively unified within a single model, achieving SoTA text line segmentation out-of-the-box quality surpassing public baselines like docTR. Fine- or LoRA-tuning our base model on public datasets further improves the quality of page segmentation, achieves state-of the art text recognition (DeepWriting, CASIA, SCUT, and Mathwriting datasets) and sketch classification (QuickDraw). This adaptability of InkFM provides a powerful starting point for developing applications with handwritten input.

Via

Access Paper or Ask Questions

Representing Online Handwriting for Recognition in Large Vision-Language Models

Feb 23, 2024

Anastasiia Fadeeva, Philippe Schlattner, Andrii Maksai, Mark Collier, Efi Kokiopoulou, Jesse Berent, Claudiu Musat

Figure 1 for Representing Online Handwriting for Recognition in Large Vision-Language Models

Figure 2 for Representing Online Handwriting for Recognition in Large Vision-Language Models

Figure 3 for Representing Online Handwriting for Recognition in Large Vision-Language Models

Figure 4 for Representing Online Handwriting for Recognition in Large Vision-Language Models

Abstract:The adoption of tablets with touchscreens and styluses is increasing, and a key feature is converting handwriting to text, enabling search, indexing, and AI assistance. Meanwhile, vision-language models (VLMs) are now the go-to solution for image understanding, thanks to both their state-of-the-art performance across a variety of tasks and the simplicity of a unified approach to training, fine-tuning, and inference. While VLMs obtain high performance on image-based tasks, they perform poorly on handwriting recognition when applied naively, i.e., by rendering handwriting as an image and performing optical character recognition (OCR). In this paper, we study online handwriting recognition with VLMs, going beyond naive OCR. We propose a novel tokenized representation of digital ink (online handwriting) that includes both a time-ordered sequence of strokes as text, and as image. We show that this representation yields results comparable to or better than state-of-the-art online handwriting recognizers. Wide applicability is shown through results with two different VLM families, on multiple public datasets. Our approach can be applied to off-the-shelf VLMs, does not require any changes in their architecture, and can be used in both fine-tuning and parameter-efficient tuning. We perform a detailed ablation study to identify the key elements of the proposed representation.

Via

Access Paper or Ask Questions

DSS: Synthesizing long Digital Ink using Data augmentation, Style encoding and Split generation

Nov 29, 2023

Aleksandr Timofeev, Anastasiia Fadeeva, Andrei Afonin, Claudiu Musat, Andrii Maksai

Abstract:As text generative models can give increasingly long answers, we tackle the problem of synthesizing long text in digital ink. We show that the commonly used models for this task fail to generalize to long-form data and how this problem can be solved by augmenting the training data, changing the model architecture and the inference procedure. These methods use contrastive learning technique and are tailored specifically for the handwriting domain. They can be applied to any encoder-decoder model that works with digital ink. We demonstrate that our method reduces the character error rate on long-form English data by half compared to baseline RNN and by 16% compared to the previous approach that aims at addressing the same problem. We show that all three parts of the method improve recognizability of generated inks. In addition, we evaluate synthesized data in a human study and find that people perceive most of generated data as real.

* Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14190, pages 217-235, Springer, Cham

Via

Access Paper or Ask Questions