Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Munir Georges

AImotion Bavaria Technische Hochschule Ingolstadt, Intel Labs Germany

Analysis of Knowledge Tracing performance on synthesised student data

Jan 30, 2024

Panagiotis Pagonis, Kai Hartung, Di Wu, Munir Georges, Sören Gröttrup

Figure 1 for Analysis of Knowledge Tracing performance on synthesised student data

Figure 2 for Analysis of Knowledge Tracing performance on synthesised student data

Figure 3 for Analysis of Knowledge Tracing performance on synthesised student data

Figure 4 for Analysis of Knowledge Tracing performance on synthesised student data

Abstract:Knowledge Tracing (KT) aims to predict the future performance of students by tracking the development of their knowledge states. Despite all the recent progress made in this field, the application of KT models in education systems is still restricted from the data perspectives: 1) limited access to real life data due to data protection concerns, 2) lack of diversity in public datasets, 3) noises in benchmark datasets such as duplicate records. To resolve these problems, we simulated student data with three statistical strategies based on public datasets and tested their performance on two KT baselines. While we observe only minor performance improvement with additional synthetic data, our work shows that using only synthetic data for training can lead to similar performance as real data.

* Accepted at AI4AI Education workshop 2023 ( https://sme.uni-bamberg.de/ai4ai/ )

Via

Access Paper or Ask Questions

Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices

Nov 30, 2023

Gokul Srinivasagan, Michael Deisher, Munir Georges

Abstract:People with visual impairments have difficulty accessing touchscreen-enabled personal computing devices like mobile phones and laptops. The image-to-speech (ITS) systems can assist them in mitigating this problem, but their huge model size makes it extremely hard to be deployed on low-resourced embedded devices. In this paper, we aim to overcome this challenge by developing an efficient endto-end neural architecture for generating audio from tiny segments of display content on low-resource devices. We introduced a vision transformers-based image encoder and utilized knowledge distillation to compress the model from 6.1 million to 2.46 million parameters. Human and automatic evaluation results show that our approach leads to a very minimal drop in performance and can speed up the inference time by 22%.

* 5 pages, 2 figures, 2 tables, presented at the 15th ITG Conference on Speech Communications, September 2023, Aachen

Via

Access Paper or Ask Questions

Measuring Sentiment Bias in Machine Translation

Jun 12, 2023

Kai Hartung, Aaricia Herygers, Shubham Kurlekar, Khabbab Zakaria, Taylan Volkan, Sören Gröttrup, Munir Georges

Abstract:Biases induced to text by generative models have become an increasingly large topic in recent years. In this paper we explore how machine translation might introduce a bias in sentiments as classified by sentiment analysis models. For this, we compare three open access machine translation models for five different languages on two parallel corpora to test if the translation process causes a shift in sentiment classes recognized in the texts. Though our statistic test indicate shifts in the label probability distributions, we find none that appears consistent enough to assume a bias induced by the translation process.

* 12 pages, 5 figures, accepted at TSD 2023

Via

Access Paper or Ask Questions

Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

Jun 07, 2023

Kevin Glocker, Aaricia Herygers, Munir Georges

Figure 1 for Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

Figure 2 for Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

Figure 3 for Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

Figure 4 for Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

Abstract:This paper proposes Allophant, a multilingual phoneme recognizer. It requires only a phoneme inventory for cross-lingual transfer to a target language, allowing for low-resource recognition. The architecture combines a compositional phone embedding approach with individually supervised phonetic attribute classifiers in a multi-task architecture. We also introduce Allophoible, an extension of the PHOIBLE database. When combined with a distance based mapping approach for grapheme-to-phoneme outputs, it allows us to train on PHOIBLE inventories directly. By training and evaluating on 34 languages, we found that the addition of multi-task learning improves the model's capability of being applied to unseen phonemes and phoneme inventories. On supervised languages we achieve phoneme error rate improvements of 11 percentage points (pp.) compared to a baseline without multi-task learning. Evaluation of zero-shot transfer on 84 languages yielded a decrease in PER of 2.63 pp. over the baseline.

* 5 pages, 2 figures, 2 tables, accepted to INTERSPEECH 2023

Via

Access Paper or Ask Questions

An End-to-End Neural Network for Image-to-Audio Transformation

Mar 10, 2023

Liu Chen, Michael Deisher, Munir Georges

Abstract:This paper describes an end-to-end (E2E) neural architecture for the audio rendering of small portions of display content on low resource personal computing devices. It is intended to address the problem of accessibility for vision-impaired or vision-distracted users at the hardware level. Neural image-to-text (ITT) and text-to-speech (TTS) approaches are reviewed and a new technique is introduced to efficiently integrate them in a way that is both efficient and back-propagate-able, leading to a non-autoregressive E2E image-to-speech (ITS) neural network that is efficient and trainable. Experimental results are presented showing that, compared with the non-E2E approach, the proposed E2E system is 29% faster and uses 19% fewer parameters with a 2% reduction in phone accuracy. A future direction to address accuracy is presented.

* 5 pages, 3 figures, 2023 IEEE Conference on Acoustics, Speech, and Signal Processing

Via

Access Paper or Ask Questions

Compact Speaker Embedding: lrx-vector

Aug 11, 2020

Munir Georges, Jonathan Huang, Tobias Bocklet

Figure 1 for Compact Speaker Embedding: lrx-vector

Figure 2 for Compact Speaker Embedding: lrx-vector

Figure 3 for Compact Speaker Embedding: lrx-vector

Figure 4 for Compact Speaker Embedding: lrx-vector

Abstract:Deep neural networks (DNN) have recently been widely used in speaker recognition systems, achieving state-of-the-art performance on various benchmarks. The x-vector architecture is especially popular in this research community, due to its excellent performance and manageable computational complexity. In this paper, we present the lrx-vector system, which is the low-rank factorized version of the x-vector embedding network. The primary objective of this topology is to further reduce the memory requirement of the speaker recognition system. We discuss the deployment of knowledge distillation for training the lrx-vector system and compare against low-rank factorization with SVD. On the VOiCES 2019 far-field corpus we were able to reduce the weights by 28% compared to the full-rank x-vector system while keeping the recognition rate constant (1.83% EER).

* Proc. Interspeech 2020
* Accepted to INTERSPEECH 2020

Via

Access Paper or Ask Questions