Abstract:Recent advancements in text-to-speech (TTS) models have aimed to streamline the two-stage process into a single-stage training approach. However, many single-stage models still lag behind in audio quality, particularly when handling Kurdish text and speech. There is a critical need to enhance text-to-speech conversion for the Kurdish language, particularly for the Sorani dialect, which has been relatively neglected and is underrepresented in recent text-to-speech advancements. This study introduces an end-to-end TTS model for efficiently generating high-quality Kurdish audio. The proposed method leverages a variational autoencoder (VAE) that is pre-trained for audio waveform reconstruction and is augmented by adversarial training. This involves aligning the prior distribution established by the pre-trained encoder with the posterior distribution of the text encoder within latent variables. Additionally, a stochastic duration predictor is incorporated to imbue synthesized Kurdish speech with diverse rhythms. By aligning latent distributions and integrating the stochastic duration predictor, the proposed method facilitates the real-time generation of natural Kurdish speech audio, offering flexibility in pitches and rhythms. Empirical evaluation via the mean opinion score (MOS) on a custom dataset confirms the superior performance of our approach (MOS of 3.94) compared with that of a one-stage system and other two-staged systems as assessed through a subjective human evaluation.
Abstract:The field of analyzing performance is very important and sensitive in particular when it is related to the performance of lecturers in academic institutions. Locating the weak points of lecturers through a system that provides an early warning to notify or reward the lecturers with warned or punished notices will help them to improve their weaknesses, leads to a better quality in the institutions. The current system has major issues in the higher education at Salahaddin University-Erbil (SUE) in Kurdistan-Iraq. These issues are: first, the assessment of lecturers' activities is conducted traditionally via the Quality Assurance Teams at different departments and colleges at the university, second, the outcomes in some cases of lecturers' performance provoke a low level of acceptance among lectures, as these cases are reflected and viewed by some academic communities as unfair cases, and finally, the current system is not accurate and vigorous. In this paper, Particle Swarm Optimization Neural Network is used to assess performance of lecturers in more fruitful way and also to enhance the accuracy of recognition system. Different real and novel data sets are collected from SUE. The prepared datasets preprocessed and important features are then fed as input source to the training and testing phases. Particle Swarm Optimization is used to find the best weights and biases in the training phase of the neural network. The best accuracy rate obtained in the test phase is 98.28 %.