Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cristian Tejedor-Garcia

Evaluating the Usefulness of Non-Diagnostic Speech Data for Developing Parkinson's Disease Classifiers

May 24, 2025

Terry Yi Zhong, Esther Janse, Cristian Tejedor-Garcia, Louis ten Bosch, Martha Larson

Abstract:Speech-based Parkinson's disease (PD) detection has gained attention for its automated, cost-effective, and non-intrusive nature. As research studies usually rely on data from diagnostic-oriented speech tasks, this work explores the feasibility of diagnosing PD on the basis of speech data not originally intended for diagnostic purposes, using the Turn-Taking (TT) dataset. Our findings indicate that TT can be as useful as diagnostic-oriented PD datasets like PC-GITA. We also investigate which specific dataset characteristics impact PD classification performance. The results show that concatenating audio recordings and balancing participants' gender and status distributions can be beneficial. Cross-dataset evaluation reveals that models trained on PC-GITA generalize poorly to TT, whereas models trained on TT perform better on PC-GITA. Furthermore, we provide insights into the high variability across folds, which is mainly due to large differences in individual speaker performance.

* Accepted for Interspeech 2025 (Camera-Ready)

Via

Access Paper or Ask Questions

Reading Miscue Detection in Primary School through Automatic Speech Recognition

Jun 11, 2024

Lingyun Gao, Cristian Tejedor-Garcia, Helmer Strik, Catia Cucchiarini

Abstract:Automatic reading diagnosis systems can benefit both teachers for more efficient scoring of reading exercises and students for accessing reading exercises with feedback more easily. However, there are limited studies on Automatic Speech Recognition (ASR) for child speech in languages other than English, and limited research on ASR-based reading diagnosis systems. This study investigates how efficiently state-of-the-art (SOTA) pretrained ASR models recognize Dutch native children speech and manage to detect reading miscues. We found that Hubert Large finetuned on Dutch speech achieves SOTA phoneme-level child speech recognition (PER at 23.1\%), while Whisper (Faster Whisper Large-v2) achieves SOTA word-level performance (WER at 9.8\%). Our findings suggest that Wav2Vec2 Large and Whisper are the two best ASR models for reading miscue detection. Specifically, Wav2Vec2 Large shows the highest recall at 0.83, whereas Whisper exhibits the highest precision at 0.52 and an F1 score of 0.52.

* Proc. INTERSPEECH 2024, 1-5 September 2024. Kos Island, Greece

Via

Access Paper or Ask Questions

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications

Jun 29, 2023

Simone Wills, Yu Bai, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik

Abstract:Voicebots have provided a new avenue for supporting the development of language skills, particularly within the context of second language learning. Voicebots, though, have largely been geared towards native adult speakers. We sought to assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0 and Whisper AI, with a view to developing a voicebot that can support children acquiring a foreign language. We evaluated their performance on read and extemporaneous speech of native and non-native Dutch children. We also investigated the utility of using ASR technology to provide insight into the children's pronunciation and fluency. The results show that recent, pre-trained ASR transformer-based models achieve acceptable performance from which detailed feedback on phoneme pronunciation quality can be extracted, despite the challenging nature of child and non-native speech.

* 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Editors: Alberto Sim\~oes, Mario Marcelo Ber\'on, and Filipe Portela; Article No. 11; pp. 11:1-11:8
* Published on SLATE 2023, Esmad, Politecnico Do Porto, Portugal, 26-28 June, 2023, pp: 11:1-11:8

Via

Access Paper or Ask Questions

An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders

Jun 07, 2023

Yu Bai, Cristian Tejedor-Garcia, Ferdy Hubers, Catia Cucchiarini, Helmer Strik

Abstract:The interest in employing automatic speech recognition (ASR) in applications for reading practice has been growing in recent years. In a previous study, we presented an ASR-based Dutch reading tutor application that was developed to provide instantaneous feedback to first-graders learning to read. We saw that ASR has potential at this stage of the reading process, as the results suggested that pupils made progress in reading accuracy and fluency by using the software. In the current study, we used children's speech from an existing corpus (JASMIN) to develop two new ASR systems, and compared the results to those of the previous study. We analyze correct/incorrect classification of the ASR systems using human transcripts at word level, by means of evaluation measures such as Cohen's Kappa, Matthews Correlation Coefficient (MCC), precision, recall and F-measures. We observe improvements for the newly developed ASR systems regarding the agreement with human-based judgment and correct rejection (CR). The accuracy of the ASR systems varies for different reading tasks and word types. Our results suggest that, in the current configuration, it is difficult to classify isolated words. We discuss these results, possible ways to improve our systems and avenues for future research.

* In: Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science, vol 12997. Springer, Cham
* Published (double-blind peer-reviewed) on SPECOM 2021

Via

Access Paper or Ask Questions

Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses

Jun 06, 2023

Lucía Gómez-Zaragozá, Simone Wills, Cristian Tejedor-Garcia, Javier Marín-Morales, Mariano Alcañiz, Helmer Strik

Figure 1 for Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses

Figure 2 for Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses

Abstract:Alzheimer's Disease (AD) is the world's leading neurodegenerative disease, which often results in communication difficulties. Analysing speech can serve as a diagnostic tool for identifying the condition. The recent ADReSS challenge provided a dataset for AD classification and highlighted the utility of manual transcriptions. In this study, we used the new state-of-the-art Automatic Speech Recognition (ASR) model Whisper to obtain the transcriptions, which also include automatic punctuation. The classification models achieved test accuracy scores of 0.854 and 0.833 combining the pretrained FastText word embeddings and recurrent neural networks on manual and ASR transcripts respectively. Additionally, we explored the influence of including pause information and punctuation in the transcriptions. We found that punctuation only yielded minor improvements in some cases, whereas pause encoding aided AD classification for both manual and ASR transcriptions across all approaches investigated.

Via

Access Paper or Ask Questions

Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics

Jun 06, 2023

Bo Molenaar, Cristian Tejedor-Garcia, Helmer Strik, Catia Cucchiarini

Abstract:Automatic assessment of reading fluency using automatic speech recognition (ASR) holds great potential for early detection of reading difficulties and subsequent timely intervention. Precise assessment tools are required, especially for languages other than English. In this study, we evaluate six state-of-the-art ASR-based systems for automatically assessing Dutch oral reading accuracy using Kaldi and Whisper. Results show our most successful system reached substantial agreement with human evaluations (MCC = .63). The same system reached the highest correlation between forced decoding confidence scores and word correctness (r = .45). This system's language model (LM) consisted of manual orthographic transcriptions and reading prompts of the test data, which shows that including reading errors in the LM improves assessment performance. We discuss the implications for developing automatic assessment systems and identify possible avenues of future research.

Via

Access Paper or Ask Questions