Abstract:This thesis investigates the effectiveness of SimCLR, a contrastive learning technique, in Greek letter recognition, focusing on the impact of various augmentation techniques. We pretrain the SimCLR backbone using the Alpub dataset (pretraining dataset) and fine-tune it on a smaller ICDAR dataset (finetuning dataset) to compare SimCLR's performance against traditional baseline models, which use cross-entropy and triplet loss functions. Additionally, we explore the role of different data augmentation strategies, essential for the SimCLR training process. Methodologically, we examine three primary approaches: (1) a baseline model using cross-entropy loss, (2) a triplet embedding model with a classification layer, and (3) a SimCLR pretrained model with a classification layer. Initially, we train the baseline, triplet, and SimCLR models using 93 augmentations on ResNet-18 and ResNet-50 networks with the ICDAR dataset. From these, the top four augmentations are selected using a statistical t-test. Pretraining of SimCLR is conducted on the Alpub dataset, followed by fine-tuning on the ICDAR dataset. The triplet loss model undergoes a similar process, being pretrained on the top four augmentations before fine-tuning on ICDAR. Our experiments show that SimCLR does not outperform the baselines in letter recognition tasks. The baseline model with cross-entropy loss demonstrates better performance than both SimCLR and the triplet loss model. This study provides a detailed evaluation of contrastive learning for letter recognition, highlighting SimCLR's limitations while emphasizing the strengths of traditional supervised learning models in this task. We believe SimCLR's cropping strategies may cause a semantic shift in the input image, reducing training effectiveness despite the large pretraining dataset. Our code is available at https://github.com/DIVA-DIA/MT_augmentation_and_contrastive_learning/.
Abstract:Managing the semantic quality of the categorization in large textual datasets, such as Wikipedia, presents significant challenges in terms of complexity and cost. In this paper, we propose leveraging transformer models to distill semantic information from texts in the Wikipedia dataset and its associated categories into a latent space. We then explore different approaches based on these encodings to assess and enhance the semantic identity of the categories. Our graphical approach is powered by Convex Hull, while we utilize Hierarchical Navigable Small Worlds (HNSWs) for the hierarchical approach. As a solution to the information loss caused by the dimensionality reduction, we modulate the following mathematical solution: an exponential decay function driven by the Euclidean distances between the high-dimensional encodings of the textual categories. This function represents a filter built around a contextual category and retrieves items with a certain Reconsideration Probability (RP). Retrieving high-RP items serves as a tool for database administrators to improve data groupings by providing recommendations and identifying outliers within a contextual framework.
Abstract:Handwriting recognition is a key technology for accessing the content of old manuscripts, helping to preserve cultural heritage. Deep learning shows an impressive performance in solving this task. However, to achieve its full potential, it requires a large amount of labeled data, which is difficult to obtain for ancient languages and scripts. Often, a trade-off has to be made between ground truth quantity and quality, as is the case for the recently introduced Bullinger database. It contains an impressive amount of over a hundred thousand labeled text line images of mostly premodern German and Latin texts that were obtained by automatically aligning existing page-level transcriptions with text line images. However, the alignment process introduces systematic errors, such as wrongly hyphenated words. In this paper, we investigate the impact of such errors on training and evaluation and suggest means to detect and correct typical alignment errors.
Abstract:On-line handwritten character segmentation is often associated with handwriting recognition and even though recognition models include mechanisms to locate relevant positions during the recognition process, it is typically insufficient to produce a precise segmentation. Decoupling the segmentation from the recognition unlocks the potential to further utilize the result of the recognition. We specifically focus on the scenario where the transcription is known beforehand, in which case the character segmentation becomes an assignment problem between sampling points of the stylus trajectory and characters in the text. Inspired by the $k$-means clustering algorithm, we view it from the perspective of cluster assignment and present a Transformer-based architecture where each cluster is formed based on a learned character query in the Transformer decoder block. In order to assess the quality of our approach, we create character segmentation ground truths for two popular on-line handwriting datasets, IAM-OnDB and HANDS-VNOnDB, and evaluate multiple methods on them, demonstrating that our approach achieves the overall best results.
Abstract:Ambiguity is ubiquitous in natural language. Resolving ambiguous meanings is especially important in information retrieval tasks. While word embeddings carry semantic information, they fail to handle ambiguity well. Transformer models have been shown to handle word ambiguity for complex queries, but they cannot be used to identify ambiguous words, e.g. for a 1-word query. Furthermore, training these models is costly in terms of time, hardware resources, and training data, prohibiting their use in specialized environments with sensitive data. Word embeddings can be trained using moderate hardware resources. This paper shows that applying DBSCAN clustering to the latent space can identify ambiguous words and evaluate their level of ambiguity. An automatic DBSCAN parameter selection leads to high-quality clusters, which are semantically coherent and correspond well to the perceived meanings of a given word.
Abstract:In this paper the concept of a machine learning based hands-on detection algorithm is proposed. The hand detection is implemented on the hardware side using a capacitive method. A sensor mat in the steering wheel detects a change in capacity as soon as the driver's hands come closer. The evaluation and final decision about hands-on or hands-off situations is done using machine learning. In order to find a suitable machine learning model, different models are implemented and evaluated. Based on accuracy, memory consumption and computational effort the most promising one is selected and ported on a micro controller. The entire system is then evaluated in terms of reliability and response time.
Abstract:Modern performance on several natural language processing (NLP) tasks has been enhanced thanks to the Transformer-based pre-trained language model BERT. We employ this concept to investigate a local publication database. Research papers are encoded and clustered to form a landscape view of the scientific topics, in which research is active. Authors working on similar topics can be identified by calculating the similarity between their papers. Based on this, we define a similarity metric between authors. Additionally we introduce the concept of self-similarity to indicate the topical variety of authors.
Abstract:The process of transforming a raster image into a vector representation is known as image tracing. This study looks into several processing methods that include high-pass filtering, autoencoding, and vectorization to extract an abstract representation of an image. According to the findings, rebuilding an image with autoencoders, high-pass filtering it, and then vectorizing it can represent the image more abstractly while increasing the effectiveness of the vectorization process.
Abstract:This project explores the feasibility of remote patient monitoring based on the analysis of 3D movements captured with smartwatches. We base our analysis on the Kinematic Theory of Rapid Human Movement. We have validated our research in a real case scenario for stroke rehabilitation at the Guttmann Institute5 (neurorehabilitation hospital), showing promising results. Our work could have a great impact in remote healthcare applications, improving the medical efficiency and reducing the healthcare costs. Future steps include more clinical validation, developing multi-modal analysis architectures (analysing data from sensors, images, audio, etc.), and exploring the application of our technology to monitor other neurodegenerative diseases.
Abstract:B\"uchi Automata on infinite words present many interesting problems and are used frequently in program verification and model checking. A lot of these problems on B\"uchi automata are computationally hard, raising the question if a learning-based data-driven analysis might be more efficient than using traditional algorithms. Since B\"uchi automata can be represented by graphs, graph neural networks are a natural choice for such a learning-based analysis. In this paper, we demonstrate how graph neural networks can be used to reliably predict basic properties of B\"uchi automata when trained on automatically generated random automata datasets.