Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Johane Takeuchi

Honda Research Institute Japan

Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations

Sep 03, 2024

Ike Ebubechukwu, Johane Takeuchi, Antonello Ceravola, Frank Joublin

Figure 1 for Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations

Figure 2 for Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations

Figure 3 for Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations

Figure 4 for Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations

Abstract:As dialogue systems and chatbots increasingly integrate into everyday interactions, the need for efficient and accurate evaluation methods becomes paramount. This study explores the comparative performance of human and AI assessments across a range of dialogue scenarios, focusing on seven key performance indicators (KPIs): Coherence, Innovation, Concreteness, Goal Contribution, Commonsense Contradiction, Incorrect Fact, and Redundancy. Utilizing the GPT-4o API, we generated a diverse dataset of conversations and conducted a two-part experimental analysis. In Experiment 1, we evaluated multi-party conversations on Coherence, Innovation, Concreteness, and Goal Contribution, revealing that GPT models align closely with human judgments. Notably, both human and AI evaluators exhibited a tendency towards binary judgment rather than linear scaling, highlighting a shared challenge in these assessments. Experiment 2 extended the work of Finch et al. (2023) by focusing on dyadic dialogues and assessing Commonsense Contradiction, Incorrect Fact, and Redundancy. The results indicate that while GPT-4o demonstrates strong performance in maintaining factual accuracy and commonsense reasoning, it still struggles with reducing redundancy and self-contradiction. Our findings underscore the potential of GPT models to closely replicate human evaluation in dialogue systems, while also pointing to areas for improvement. This research offers valuable insights for advancing the development and implementation of more refined dialogue evaluation methodologies, contributing to the evolution of more effective and human-like AI communication tools.

* 17 pages, 15 figures, shorter version submitted to 22nd Annual Workshop of the Australasian Language Technology Association (ALTA'24)

Via

Access Paper or Ask Questions

Utilization of domain knowledge to improve POMDP belief estimation

Feb 17, 2023

Tung Nguyen, Johane Takeuchi

Abstract:The partially observable Markov decision process (POMDP) framework is a common approach for decision making under uncertainty. Recently, multiple studies have shown that by integrating relevant domain knowledge into POMDP belief estimation, we can improve the learned policy's performance. In this study, we propose a novel method for integrating the domain knowledge into probabilistic belief update in POMDP framework using Jeffrey's rule and normalization. We show that the domain knowledge can be utilized to reduce the data requirement and improve performance for POMDP policy learning with RL.

* 5 pages, 2 figures

Via

Access Paper or Ask Questions

Apprenticeship Learning for Model Parameters of Partially Observable Environments

Jun 27, 2012

Takaki Makino, Johane Takeuchi

Figure 1 for Apprenticeship Learning for Model Parameters of Partially Observable Environments

Figure 2 for Apprenticeship Learning for Model Parameters of Partially Observable Environments

Figure 3 for Apprenticeship Learning for Model Parameters of Partially Observable Environments

Figure 4 for Apprenticeship Learning for Model Parameters of Partially Observable Environments

Abstract:We consider apprenticeship learning, i.e., having an agent learn a task by observing an expert demonstrating the task in a partially observable environment when the model of the environment is uncertain. This setting is useful in applications where the explicit modeling of the environment is difficult, such as a dialogue system. We show that we can extract information about the environment model by inferring action selection process behind the demonstration, under the assumption that the expert is choosing optimal actions based on knowledge of the true model of the target environment. Proposed algorithms can achieve more accurate estimates of POMDP parameters and better policies from a short demonstration, compared to methods that learns only from the reaction from the environment.

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

Via

Access Paper or Ask Questions