Abstract:Autism Spectrum Disorder (ASD) is characterized by challenges in social communication and restricted patterns, with motor abnormalities gaining traction for early detection. However, kinematic analysis in ASD is limited, often lacking robust validation and relying on hand-crafted features for single tasks, leading to inconsistencies across studies. Thus, end-to-end models have become promising methods to overcome the need for feature engineering. Our aim is to assess both approaches across various kinematic tasks to measure the efficacy of commonly used features in ASD assessment, while comparing them to end-to-end models. Specifically, we developed a virtual reality environment with multiple motor tasks and trained models using both classification approaches. We prioritized a reliable validation framework with repeated cross-validation. Our comparative analysis revealed that hand-crafted features outperformed our deep learning approach in specific tasks, achieving a state-of-the-art area under the curve (AUC) of 0.90$\pm$0.06. Conversely, end-to-end models provided more consistent results with less variability across all VR tasks, demonstrating domain generalization and reliability, with a maximum task AUC of 0.89$\pm$0.06. These findings show that end-to-end models enable less variable and context-independent ASD assessments without requiring domain knowledge or task specificity. However, they also recognize the effectiveness of hand-crafted features in specific task scenarios.
Abstract:Alzheimer's Disease (AD) is the world's leading neurodegenerative disease, which often results in communication difficulties. Analysing speech can serve as a diagnostic tool for identifying the condition. The recent ADReSS challenge provided a dataset for AD classification and highlighted the utility of manual transcriptions. In this study, we used the new state-of-the-art Automatic Speech Recognition (ASR) model Whisper to obtain the transcriptions, which also include automatic punctuation. The classification models achieved test accuracy scores of 0.854 and 0.833 combining the pretrained FastText word embeddings and recurrent neural networks on manual and ASR transcripts respectively. Additionally, we explored the influence of including pause information and punctuation in the transcriptions. We found that punctuation only yielded minor improvements in some cases, whereas pause encoding aided AD classification for both manual and ASR transcriptions across all approaches investigated.