Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arushi Raghuvanshi

Multimodal Contextual Dialogue Breakdown Detection for Conversational AI Models

Apr 11, 2024

Md Messal Monem Miah, Ulie Schnaithmann, Arushi Raghuvanshi, Youngseo Son

Abstract:Detecting dialogue breakdown in real time is critical for conversational AI systems, because it enables taking corrective action to successfully complete a task. In spoken dialog systems, this breakdown can be caused by a variety of unexpected situations including high levels of background noise, causing STT mistranscriptions, or unexpected user flows. In particular, industry settings like healthcare, require high precision and high flexibility to navigate differently based on the conversation history and dialogue states. This makes it both more challenging and more critical to accurately detect dialog breakdown. To accurately detect breakdown, we found it requires processing audio inputs along with downstream NLP model inferences on transcribed text in real time. In this paper, we introduce a Multimodal Contextual Dialogue Breakdown (MultConDB) model. This model significantly outperforms other known best models by achieving an F1 of 69.27.

* Published in NAACL 2024 Industry Track

Via

Access Paper or Ask Questions

Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

Apr 11, 2024

Amin Hosseiny Marani, Ulie Schnaithmann, Youngseo Son, Akil Iyer, Manas Paldhe, Arushi Raghuvanshi

Figure 1 for Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

Figure 2 for Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

Figure 3 for Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

Figure 4 for Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

Abstract:Current Conversational AI systems employ different machine learning pipelines, as well as external knowledge sources and business logic to predict the next action. Maintaining various components in dialogue managers' pipeline adds complexity in expansion and updates, increases processing time, and causes additive noise through the pipeline that can lead to incorrect next action prediction. This paper investigates graph integration into language transformers to improve understanding the relationships between humans' utterances, previous, and next actions without the dependency on external sources or components. Experimental analyses on real calls indicate that the proposed Graph Integrated Language Transformer models can achieve higher performance compared to other production level conversational AI systems in driving interactive calls with human users in real-world settings.

* Published in NAACL 2024 Industry Track

Via

Access Paper or Ask Questions

Leveraging Explicit Procedural Instructions for Data-Efficient Action Prediction

Jun 06, 2023

Julia White, Arushi Raghuvanshi, Yada Pruksachatkun

Abstract:Task-oriented dialogues often require agents to enact complex, multi-step procedures in order to meet user requests. While large language models have found success automating these dialogues in constrained environments, their widespread deployment is limited by the substantial quantities of task-specific data required for training. The following paper presents a data-efficient solution to constructing dialogue systems, leveraging explicit instructions derived from agent guidelines, such as company policies or customer service manuals. Our proposed Knowledge-Augmented Dialogue System (KADS) combines a large language model with a knowledge retrieval module that pulls documents outlining relevant procedures from a predefined set of policies, given a user-agent interaction. To train this system, we introduce a semi-supervised pre-training scheme that employs dialogue-document matching and action-oriented masked language modeling with partial parameter freezing. We evaluate the effectiveness of our approach on prominent task-oriented dialogue datasets, Action-Based Conversations Dataset and Schema-Guided Dialogue, for two dialogue tasks: action state tracking and workflow discovery. Our results demonstrate that procedural knowledge augmentation improves accuracy predicting in- and out-of-distribution actions while preserving high performance in settings with low or sparse data.

Via

Access Paper or Ask Questions