Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anna Scius-Bertrand

Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models

Dec 18, 2024

Anna Scius-Bertrand, Michael Jungo, Lars Vögtlin, Jean-Marc Spat, Andreas Fischer

Abstract:Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding. Nevertheless, for certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to near-perfect performance when considering hundreds of thousands of training samples. With the advent of large language models (LLMs), which are excellent few-shot learners, the question arises to what extent the document classification problem can be addressed with only a few training samples, or even none at all. In this paper, we investigate this question in the context of zero-shot prompting and few-shot model fine-tuning, with the aim of reducing the need for human-annotated training samples as much as possible.

* International Conference on Pattern Recognition - ICPR 2024, pp 152-166. Cham: Springer Nature Switzerland
* ICPR 2024

Via

Access Paper or Ask Questions

Impact of Ground Truth Quality on Handwriting Recognition

Dec 14, 2023

Michael Jungo, Lars Vögtlin, Atefeh Fakhari, Nathan Wegmann, Rolf Ingold, Andreas Fischer, Anna Scius-Bertrand

Abstract:Handwriting recognition is a key technology for accessing the content of old manuscripts, helping to preserve cultural heritage. Deep learning shows an impressive performance in solving this task. However, to achieve its full potential, it requires a large amount of labeled data, which is difficult to obtain for ancient languages and scripts. Often, a trade-off has to be made between ground truth quantity and quality, as is the case for the recently introduced Bullinger database. It contains an impressive amount of over a hundred thousand labeled text line images of mostly premodern German and Latin texts that were obtained by automatically aligning existing page-level transcriptions with text line images. However, the alignment process introduces systematic errors, such as wrongly hyphenated words. In this paper, we investigate the impact of such errors on training and evaluation and suggest means to detect and correct typical alignment errors.

* SOICT 2023: The 12th International Symposium on Information and Communication Technology
* SOICT 2023

Via

Access Paper or Ask Questions

The RPM3D project: 3D Kinematics for Remote Patient Monitoring

Dec 09, 2022

Alicia Fornés, Asma Bensalah, Cristina Carmona-Duarte, Jialuo Chen, Miguel A. Ferrer, Andreas Fischer, Josep Lladós, Cristina Martín, Eloy Opisso, Réjean Plamondon(+2 more)

Abstract:This project explores the feasibility of remote patient monitoring based on the analysis of 3D movements captured with smartwatches. We base our analysis on the Kinematic Theory of Rapid Human Movement. We have validated our research in a real case scenario for stroke rehabilitation at the Guttmann Institute5 (neurorehabilitation hospital), showing promising results. Our work could have a great impact in remote healthcare applications, improving the medical efficiency and reducing the healthcare costs. Future steps include more clinical validation, developing multi-modal analysis architectures (analysing data from sensors, images, audio, etc.), and exploring the application of our technology to monitor other neurodegenerative diseases.

Via

Access Paper or Ask Questions