Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrew Abel

Deep Speaker Vector Normalization with Maximum Gaussianality Training

Oct 30, 2020

Yunqi Cai, Lantian Li, Dong Wang, Andrew Abel

Figure 1 for Deep Speaker Vector Normalization with Maximum Gaussianality Training

Figure 2 for Deep Speaker Vector Normalization with Maximum Gaussianality Training

Figure 3 for Deep Speaker Vector Normalization with Maximum Gaussianality Training

Figure 4 for Deep Speaker Vector Normalization with Maximum Gaussianality Training

Abstract:Deep speaker embedding represents the state-of-the-art technique for speaker recognition. A key problem with this approach is that the resulting deep speaker vectors tend to be irregularly distributed. In previous research, we proposed a deep normalization approach based on a new discriminative normalization flow (DNF) model, by which the distributions of individual speakers are arguably transformed to homogeneous Gaussians. This normalization was demonstrated to be effective, but despite this remarkable success, we empirically found that the latent codes produced by the DNF model are generally neither homogeneous nor Gaussian, although the model has assumed so. In this paper, we argue that this problem is largely attributed to the maximum-likelihood (ML) training criterion of the DNF model, which aims to maximize the likelihood of the observations but not necessarily improve the Gaussianality of the latent codes. We therefore propose a new Maximum Gaussianality (MG) training approach that directly maximizes the Gaussianality of the latent codes. Our experiments on two data sets, SITW and CNCeleb, demonstrate that our new MG training approach can deliver much better performance than the previous ML training, and exhibits improved domain generalizability, particularly with regard to cosine scoring.

Via

Access Paper or Ask Questions

Deep Normalization for Speaker Vectors

Apr 07, 2020

Yunqi Cai, Lantian Li, Dong Wang, Andrew Abel

Figure 1 for Deep Normalization for Speaker Vectors

Figure 2 for Deep Normalization for Speaker Vectors

Figure 3 for Deep Normalization for Speaker Vectors

Figure 4 for Deep Normalization for Speaker Vectors

Abstract:Deep speaker embedding has demonstrated state-of-the-art performance in audio speaker recognition (SRE). However, one potential issue with this approach is that the speaker vectors derived from deep embedding models tend to be non-Gaussian for each individual speaker, and non-homogeneous for distributions of different speakers. These irregular distributions can seriously impact SRE performance, especially with the popular PLDA scoring method, which assumes homogeneous Gaussian distribution. In this paper, we argue that deep speaker vectors require deep normalization, and propose a deep normalization approach based on a novel discriminative normalization flow (DNF) model. We demonstrate the effectiveness of the proposed approach with experiments using the widely used SITW and CNCeleb corpora. In these experiments, the DNF-based normalization delivered substantial performance gains and also showed strong generalization capability in out-of-domain tests.

Via

Access Paper or Ask Questions

Phonetic Temporal Neural Model for Language Identification

Aug 25, 2017

Zhiyuan Tang, Dong Wang, Yixiang Chen, Lantian Li, Andrew Abel

Figure 1 for Phonetic Temporal Neural Model for Language Identification

Figure 2 for Phonetic Temporal Neural Model for Language Identification

Figure 3 for Phonetic Temporal Neural Model for Language Identification

Figure 4 for Phonetic Temporal Neural Model for Language Identification

Abstract:Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing acoustic neural models. It also outperforms the conventional i-vector approach on short utterances and in noisy conditions.

* Submitted to TASLP

Via

Access Paper or Ask Questions

Memory-augmented Neural Machine Translation

Aug 07, 2017

Yang Feng, Shiyue Zhang, Andi Zhang, Dong Wang, Andrew Abel

Figure 1 for Memory-augmented Neural Machine Translation

Figure 2 for Memory-augmented Neural Machine Translation

Figure 3 for Memory-augmented Neural Machine Translation

Figure 4 for Memory-augmented Neural Machine Translation

Abstract:Neural machine translation (NMT) has achieved notable success in recent times, however it is also widely recognized that this approach has limitations with handling infrequent words and word pairs. This paper presents a novel memory-augmented NMT (M-NMT) architecture, which stores knowledge about how words (usually infrequently encountered ones) should be translated in a memory and then utilizes them to assist the neural model. We use this memory mechanism to combine the knowledge learned from a conventional statistical machine translation system and the rules learned by an NMT system, and also propose a solution for out-of-vocabulary (OOV) words based on this framework. Our experiments on two Chinese-English translation tasks demonstrated that the M-NMT architecture outperformed the NMT baseline by $9.0$ and $2.7$ BLEU points on the two tasks, respectively. Additionally, we found this architecture resulted in a much more effective OOV treatment compared to competitive methods.

Via

Access Paper or Ask Questions

Collaborative Learning for Language and Speaker Recognition

May 23, 2017

Lantian Li, Zhiyuan Tang, Dong Wang, Andrew Abel, Yang Feng, Shiyue Zhang

Figure 1 for Collaborative Learning for Language and Speaker Recognition

Figure 2 for Collaborative Learning for Language and Speaker Recognition

Figure 3 for Collaborative Learning for Language and Speaker Recognition

Figure 4 for Collaborative Learning for Language and Speaker Recognition

Abstract:This paper presents a unified model to perform language and speaker recognition simultaneously and altogether. The model is based on a multi-task recurrent neural network where the output of one task is fed as the input of the other, leading to a collaborative learning framework that can improve both language and speaker recognition by borrowing information from each other. Our experiments demonstrated that the multi-task model outperforms the task-specific models on both tasks.

Via

Access Paper or Ask Questions

Flexible and Creative Chinese Poetry Generation Using Neural Memory

May 10, 2017

Jiyuan Zhang, Yang Feng, Dong Wang, Yang Wang, Andrew Abel, Shiyue Zhang, Andi Zhang

Figure 1 for Flexible and Creative Chinese Poetry Generation Using Neural Memory

Figure 2 for Flexible and Creative Chinese Poetry Generation Using Neural Memory

Figure 3 for Flexible and Creative Chinese Poetry Generation Using Neural Memory

Figure 4 for Flexible and Creative Chinese Poetry Generation Using Neural Memory

Abstract:It has been shown that Chinese poems can be successfully generated by sequence-to-sequence neural models, particularly with the attention mechanism. A potential problem of this approach, however, is that neural models can only learn abstract rules, while poem generation is a highly creative process that involves not only rules but also innovations for which pure statistical models are not appropriate in principle. This work proposes a memory-augmented neural model for Chinese poem generation, where the neural model and the augmented memory work together to balance the requirements of linguistic accordance and aesthetic innovation, leading to innovative generations that are still rule-compliant. In addition, it is found that the memory mechanism provides interesting flexibility that can be used to generate poems with different styles.

Via

Access Paper or Ask Questions

Probabilistic Belief Embedding for Knowledge Base Completion

May 22, 2015

Miao Fan, Qiang Zhou, Andrew Abel, Thomas Fang Zheng, Ralph Grishman

Figure 1 for Probabilistic Belief Embedding for Knowledge Base Completion

Figure 2 for Probabilistic Belief Embedding for Knowledge Base Completion

Figure 3 for Probabilistic Belief Embedding for Knowledge Base Completion

Figure 4 for Probabilistic Belief Embedding for Knowledge Base Completion

Abstract:This paper contributes a novel embedding model which measures the probability of each belief $\langle h,r,t,m\rangle$ in a large-scale knowledge repository via simultaneously learning distributed representations for entities ($h$ and $t$), relations ($r$), and the words in relation mentions ($m$). It facilitates knowledge completion by means of simple vector operations to discover new beliefs. Given an imperfect belief, we can not only infer the missing entities, predict the unknown relations, but also tell the plausibility of the belief, just leveraging the learnt embeddings of remaining evidences. To demonstrate the scalability and the effectiveness of our model, we conduct experiments on several large-scale repositories which contain millions of beliefs from WordNet, Freebase and NELL, and compare it with other cutting-edge approaches via competing the performances assessed by the tasks of entity inference, relation prediction and triplet classification with respective metrics. Extensive experimental results show that the proposed model outperforms the state-of-the-arts with significant improvements.

* arXiv admin note: text overlap with arXiv:1503.08155

Via

Access Paper or Ask Questions