Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ulie Schnaithmann

Multimodal Contextual Dialogue Breakdown Detection for Conversational AI Models

Apr 11, 2024

Md Messal Monem Miah, Ulie Schnaithmann, Arushi Raghuvanshi, Youngseo Son

Abstract:Detecting dialogue breakdown in real time is critical for conversational AI systems, because it enables taking corrective action to successfully complete a task. In spoken dialog systems, this breakdown can be caused by a variety of unexpected situations including high levels of background noise, causing STT mistranscriptions, or unexpected user flows. In particular, industry settings like healthcare, require high precision and high flexibility to navigate differently based on the conversation history and dialogue states. This makes it both more challenging and more critical to accurately detect dialog breakdown. To accurately detect breakdown, we found it requires processing audio inputs along with downstream NLP model inferences on transcribed text in real time. In this paper, we introduce a Multimodal Contextual Dialogue Breakdown (MultConDB) model. This model significantly outperforms other known best models by achieving an F1 of 69.27.

* Published in NAACL 2024 Industry Track

Via

Access Paper or Ask Questions

Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

Apr 11, 2024

Amin Hosseiny Marani, Ulie Schnaithmann, Youngseo Son, Akil Iyer, Manas Paldhe, Arushi Raghuvanshi

Figure 1 for Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

Figure 2 for Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

Figure 3 for Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

Figure 4 for Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

Abstract:Current Conversational AI systems employ different machine learning pipelines, as well as external knowledge sources and business logic to predict the next action. Maintaining various components in dialogue managers' pipeline adds complexity in expansion and updates, increases processing time, and causes additive noise through the pipeline that can lead to incorrect next action prediction. This paper investigates graph integration into language transformers to improve understanding the relationships between humans' utterances, previous, and next actions without the dependency on external sources or components. Experimental analyses on real calls indicate that the proposed Graph Integrated Language Transformer models can achieve higher performance compared to other production level conversational AI systems in driving interactive calls with human users in real-world settings.

* Published in NAACL 2024 Industry Track

Via

Access Paper or Ask Questions