Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jan Silovsky

Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for Low-Resource Speech Recognition with Transducers

May 23, 2023

Jan Silovsky, Liuhui Deng, Arturo Argueta, Tresi Arvizo, Roger Hsiao, Sasha Kuznietsov, Yiu-Chang Lin, Xiaoqiang Xiao, Yuanyuan Zhang

Abstract:Voice technology has become ubiquitous recently. However, the accuracy, and hence experience, in different languages varies significantly, which makes the technology not equally inclusive. The availability of data for different languages is one of the key factors affecting accuracy, especially in training of all-neural end-to-end automatic speech recognition systems. Cross-lingual knowledge transfer and iterative pseudo-labeling are two techniques that have been shown to be successful for improving the accuracy of ASR systems, in particular for low-resource languages, like Ukrainian. Our goal is to train an all-neural Transducer-based ASR system to replace a DNN-HMM hybrid system with no manually annotated training data. We show that the Transducer system trained using transcripts produced by the hybrid system achieves 18% reduction in terms of word error rate. However, using a combination of cross-lingual knowledge transfer from related languages and iterative pseudo-labeling, we are able to achieve 35% reduction of the error rate.

Via

Access Paper or Ask Questions

Learning from Noisy Labels with Noise Modeling Network

May 01, 2020

Zhuolin Jiang, Jan Silovsky, Man-Hung Siu, William Hartmann, Herbert Gish, Sancar Adali

Figure 1 for Learning from Noisy Labels with Noise Modeling Network

Figure 2 for Learning from Noisy Labels with Noise Modeling Network

Figure 3 for Learning from Noisy Labels with Noise Modeling Network

Figure 4 for Learning from Noisy Labels with Noise Modeling Network

Abstract:Multi-label image classification has generated significant interest in recent years and the performance of such systems often suffers from the not so infrequent occurrence of incorrect or missing labels in the training data. In this paper, we extend the state-of the-art of training classifiers to jointly deal with both forms of errorful data. We accomplish this by modeling noisy and missing labels in multi-label images with a new Noise Modeling Network (NMN) that follows our convolutional neural network (CNN), integrates with it, forming an end-to-end deep learning system, which can jointly learn the noise distribution and CNN parameters. The NMN learns the distribution of noise patterns directly from the noisy data without the need for any clean training data. The NMN can model label noise that depends only on the true label or is also dependent on the image features. We show that the integrated NMN/CNN learning system consistently improves the classification performance, for different levels of label noise, on the MSR-COCO dataset and MSR-VTT dataset. We also show that noise performance improvements are obtained when multiple instance learning methods are used.

Via

Access Paper or Ask Questions

Improving Language Identification for Multilingual Speakers

Jan 29, 2020

Andrew Titus, Jan Silovsky, Nanxin Chen, Roger Hsiao, Mary Young, Arnab Ghoshal

Figure 1 for Improving Language Identification for Multilingual Speakers

Figure 2 for Improving Language Identification for Multilingual Speakers

Figure 3 for Improving Language Identification for Multilingual Speakers

Figure 4 for Improving Language Identification for Multilingual Speakers

Abstract:Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language. One aspect that has been mostly neglected, however, is discrimination of languages for multilingual speakers, despite being a primary target audience of many systems that utilize LID technologies. As we show in this work, LID systems can have a high average accuracy for most combinations of languages while greatly underperforming for others when accented speech is present. We address this by using coarser-grained targets for the acoustic LID model and integrating its outputs with interaction context signals in a context-aware model to tailor the system to each user. This combined system achieves an average 97% accuracy across all language combinations while improving worst-case accuracy by over 60% relative to our baseline.

* 5 pages, 2 figures. Submitted to ICASSP 2020

Via

Access Paper or Ask Questions

Towards a New Understanding of the Training of Neural Networks with Mislabeled Training Data

Sep 18, 2019

Herbert Gish, Jan Silovsky, Man-Ling Sung, Man-Hung Siu, William Hartmann, Zhuolin Jiang

Figure 1 for Towards a New Understanding of the Training of Neural Networks with Mislabeled Training Data

Figure 2 for Towards a New Understanding of the Training of Neural Networks with Mislabeled Training Data

Figure 3 for Towards a New Understanding of the Training of Neural Networks with Mislabeled Training Data

Abstract:We investigate the problem of machine learning with mislabeled training data. We try to make the effects of mislabeled training better understood through analysis of the basic model and equations that characterize the problem. This includes results about the ability of the noisy model to make the same decisions as the clean model and the effects of noise on model performance. In addition to providing better insights we also are able to show that the Maximum Likelihood (ML) estimate of the parameters of the noisy model determine those of the clean model. This property is obtained through the use of the ML invariance property and leads to an approach to developing a classifier when training has been mislabeled: namely train the classifier on noisy data and adjust the decision threshold based on the noise levels and/or class priors. We show how our approach to mislabeled training works with multi-layered perceptrons (MLPs).

* 13 pages with 3 figures

Via

Access Paper or Ask Questions