Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:FANS: Fusing ASR and NLU for on-device SLU

Oct 31, 2021

Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow

Figure 1 for FANS: Fusing ASR and NLU for on-device SLU

Figure 2 for FANS: Fusing ASR and NLU for on-device SLU

Figure 3 for FANS: Fusing ASR and NLU for on-device SLU

Figure 4 for FANS: Fusing ASR and NLU for on-device SLU

Share this with someone who'll enjoy it:

Abstract:Spoken language understanding (SLU) systems translate voice input commands to semantics which are encoded as an intent and pairs of slot tags and values. Most current SLU systems deploy a cascade of two neural models where the first one maps the input audio to a transcript (ASR) and the second predicts the intent and slots from the transcript (NLU). In this paper, we introduce FANS, a new end-to-end SLU model that fuses an ASR audio encoder to a multi-task NLU decoder to infer the intent, slot tags, and slot values directly from a given input audio, obviating the need for transcription. FANS consists of a shared audio encoder and three decoders, two of which are seq-to-seq decoders that predict non null slot tags and slot values in parallel and in an auto-regressive manner. FANS neural encoder and decoders architectures are flexible which allows us to leverage different combinations of LSTM, self-attention, and attenders. Our experiments show compared to the state-of-the-art end-to-end SLU models, FANS reduces ICER and IRER errors relatively by 30 % and 7 %, respectively, when tested on an in-house SLU dataset and by 0.86 % and 2 % absolute when tested on a public SLU dataset.

* Published in Interspeech 2021

View paper on

Share this with someone who'll enjoy it:

Title:FANS: Fusing ASR and NLU for on-device SLU

Paper and Code