Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Nahamoo

Quantized-Dialog Language Model for Goal-Oriented Conversational Systems

Dec 26, 2018

R. Chulaka Gunasekara, David Nahamoo, Lazaros C. Polymenakos, Jatin Ganhotra, Kshitij P. Fadnis

Figure 1 for Quantized-Dialog Language Model for Goal-Oriented Conversational Systems

Figure 2 for Quantized-Dialog Language Model for Goal-Oriented Conversational Systems

Figure 3 for Quantized-Dialog Language Model for Goal-Oriented Conversational Systems

Figure 4 for Quantized-Dialog Language Model for Goal-Oriented Conversational Systems

Abstract:We propose a novel methodology to address dialog learning in the context of goal-oriented conversational systems. The key idea is to quantize the dialog space into clusters and create a language model across the clusters, thus allowing for an accurate choice of the next utterance in the conversation. The language model relies on n-grams associated with clusters of utterances. This quantized-dialog language model methodology has been applied to the end-to-end goal-oriented track of the latest Dialog System Technology Challenges (DSTC6). The objective is to find the correct system utterance from a pool of candidates in order to complete a dialog between a user and an automated restaurant-reservation system. Our results show that the technique proposed in this paper achieves high accuracy regarding selection of the correct candidate utterance, and outperforms other state-of-the-art approaches based on neural networks.

Via

Access Paper or Ask Questions

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Mar 22, 2017

Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo

Figure 1 for Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Figure 2 for Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Figure 3 for Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Figure 4 for Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Abstract:Recent work on end-to-end automatic speech recognition (ASR) has shown that the connectionist temporal classification (CTC) loss can be used to convert acoustics to phone or character sequences. Such systems are used with a dictionary and separately-trained Language Model (LM) to produce word sequences. However, they are not truly end-to-end in the sense of mapping acoustics directly to words without an intermediate phone representation. In this paper, we present the first results employing direct acoustics-to-word CTC models on two well-known public benchmark tasks: Switchboard and CallHome. These models do not require an LM or even a decoder at run-time and hence recognize speech with minimal complexity. However, due to the large number of word output units, CTC word models require orders of magnitude more data to train reliably compared to traditional systems. We present some techniques to mitigate this issue. Our CTC word model achieves a word error rate of 13.0%/18.8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or decoder compared with 9.6%/16.0% for phone-based CTC with a 4-gram LM. We also present rescoring results on CTC word model lattices to quantify the performance benefits of a LM, and contrast the performance of word and phone CTC models.

* Submitted to Interspeech-2017

Via

Access Paper or Ask Questions