Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yin Jou Huang

Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models

Apr 11, 2025

Yin Jou Huang, Rafik Hadfi

Abstract:There is a growing interest in assessing the personality traits of Large language models (LLMs). However, traditional personality assessments based on self-report questionnaires may fail to capture their true behavioral nuances due to inherent biases and meta-knowledge contamination. This paper introduces a novel multi-observer framework for LLM personality assessment that draws inspiration from informant-report methods in psychology. Instead of relying solely on self-assessments, our approach employs multiple observer agents configured with a specific relationship context (e.g., family, friend, or workplace) to simulate interactive scenarios with a subject LLM. These observers engage in dialogues and subsequently provide ratings across the Big Five personality dimensions. Our experiments reveal that LLMs possess systematic biases in self-report personality ratings. Moreover, aggregating observer ratings effectively reduces non-systematic biases and achieves optimal reliability with 5-7 observers. The findings highlight the significant impact of relationship context on personality perception and demonstrate that a multi-observer paradigm yields a more robust and context-sensitive evaluation of LLM personality traits.

* 13 pages, 5 figures, 2 tables

Via

Access Paper or Ask Questions

Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Oct 09, 2024

Shiho Matta, Yin Jou Huang, Fei Cheng, Hirokazu Kiyomaru, Yugo Murawaki

Figure 1 for Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Figure 2 for Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Figure 3 for Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Figure 4 for Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Abstract:Recent studies have demonstrated that few-shot learning allows LLMs to generate training data for supervised models at a low cost. However, the quality of LLM-generated data may not entirely match that of human-labeled data. This raises a crucial question: how should one balance the trade-off between the higher quality but more expensive human data and the lower quality yet substantially cheaper LLM-generated data? In this paper, we synthesized training data for conversational semantic frame analysis using GPT-4 and examined how to allocate budgets optimally to achieve the best performance. Our experiments, conducted across various budget levels, reveal that optimal cost-efficiency is achieved by combining both human and LLM-generated data across a wide range of budget levels. Notably, as the budget decreases, a higher proportion of LLM-generated data becomes more preferable.

* 12 pages including 4 pages of references and appendix. 7 figures

Via

Access Paper or Ask Questions

How Personality Traits Influence Negotiation Outcomes? A Simulation based on Large Language Models

Jul 16, 2024

Yin Jou Huang, Rafik Hadfi

Abstract:Psychological evidence reveals the influence of personality traits on decision-making. For instance, agreeableness is generally associated with positive outcomes in negotiations, whereas neuroticism is often linked to less favorable outcomes. This paper introduces a simulation framework centered on Large Language Model (LLM) agents endowed with synthesized personality traits. The agents negotiate within bargaining domains and possess customizable personalities and objectives. The experimental results show that the behavioral tendencies of LLM-based simulations could reproduce behavioral patterns observed in human negotiations. The contribution is twofold. First, we propose a simulation methodology that investigates the alignment between the linguistic and economic capabilities of LLM agents. Secondly, we offer empirical insights into the strategic impact of Big-Five personality traits on the outcomes of bilateral negotiations. We also provide a case study based on synthesized bargaining dialogues to reveal intriguing behaviors, including deceitful and compromising behaviors.

* 13 pages, 4 figures

Via

Access Paper or Ask Questions

RecMind: Japanese Movie Recommendation Dialogue with Seeker's Internal State

Feb 21, 2024

Takashi Kodama, Hirokazu Kiyomaru, Yin Jou Huang, Sadao Kurohashi

Figure 1 for RecMind: Japanese Movie Recommendation Dialogue with Seeker's Internal State

Figure 2 for RecMind: Japanese Movie Recommendation Dialogue with Seeker's Internal State

Figure 3 for RecMind: Japanese Movie Recommendation Dialogue with Seeker's Internal State

Figure 4 for RecMind: Japanese Movie Recommendation Dialogue with Seeker's Internal State

Abstract:Humans pay careful attention to the interlocutor's internal state in dialogues. For example, in recommendation dialogues, we make recommendations while estimating the seeker's internal state, such as his/her level of knowledge and interest. Since there are no existing annotated resources for the analysis, we constructed RecMind, a Japanese movie recommendation dialogue dataset with annotations of the seeker's internal state at the entity level. Each entity has a subjective label annotated by the seeker and an objective label annotated by the recommender. RecMind also features engaging dialogues with long seeker's utterances, enabling a detailed analysis of the seeker's internal state. Our analysis based on RecMind reveals that entities that the seeker has no knowledge about but has an interest in contribute to recommendation success. We also propose a response generation framework that explicitly considers the seeker's internal state, utilizing the chain-of-thought prompting. The human evaluation results show that our proposed method outperforms the baseline method in both consistency and the success of recommendations.

Via

Access Paper or Ask Questions