Abstract:This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide. The dataset consists of about 93 hours of transcribed audio recordings spoken by two professional speakers (female and male). It is the first publicly available large-scale dataset developed to promote Kazakh text-to-speech (TTS) applications in both academia and industry. In this paper, we share our experience by describing the dataset development procedures and faced challenges, and discuss important future directions. To demonstrate the reliability of our dataset, we built baseline end-to-end TTS models and evaluated them using the subjective mean opinion score (MOS) measure. Evaluation results show that the best TTS models trained on our dataset achieve MOS above 4 for both speakers, which makes them applicable for practical use. The dataset, training recipe, and pretrained TTS models are freely available.
Abstract:We present an open-source speech corpus for the Kazakh language. The Kazakh speech corpus (KSC) contains around 335 hours of transcribed audio comprising over 154,000 utterances spoken by participants from different regions, age groups, and gender. It was carefully inspected by native Kazakh speakers to ensure high quality. The KSC is the largest publicly available database developed to advance various Kazakh speech and language processing applications. In this paper, we first describe the data collection and prepossessing procedures followed by the description of the database specifications. We also share our experience and challenges faced during database construction. To demonstrate the reliability of the database, we performed the preliminary speech recognition experiments. The experimental results imply that the quality of audio and transcripts are promising. To enable experiment reproducibility and ease the corpus usage, we also released the ESPnet recipe.
Abstract:In this work, we present an end-to-end deep learning framework for X-ray image diagnosis. As the first step, our system determines whether a submitted image is an X-ray or not. After it classifies the type of the X-ray, it runs the dedicated abnormality classification network. In this work, we only focus on the chest X-rays for abnormality classification. However, the system can be extended to other X-ray types easily. Our deep learning classifiers are based on DenseNet-121 architecture. The test set accuracy obtained for 'X-ray or Not', 'X-ray Type Classification', and 'Chest Abnormality Classification' tasks are 0.987, 0.976, and 0.947, respectively, resulting into an end-to-end accuracy of 0.91. For achieving better results than the state-of-the-art in the 'Chest Abnormality Classification', we utilize the new RAdam optimizer. We also use Gradient-weighted Class Activation Mapping for visual explanation of the results. Our results show the feasibility of a generalized online projectional radiography diagnosis system.