Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Moosung Kim

When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus

Apr 01, 2023

Won Ik Cho, Yoon Kyung Lee, Seoyeon Bae, Jihwan Kim, Sangah Park, Moosung Kim, Sowon Hahn, Nam Soo Kim

Figure 1 for When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus

Figure 2 for When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus

Figure 3 for When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus

Figure 4 for When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus

Abstract:Building a natural language dataset requires caution since word semantics is vulnerable to subtle text change or the definition of the annotated concept. Such a tendency can be seen in generative tasks like question-answering and dialogue generation and also in tasks that create a categorization-based corpus, like topic classification or sentiment analysis. Open-domain conversations involve two or more crowdworkers freely conversing about any topic, and collecting such data is particularly difficult for two reasons: 1) the dataset should be ``crafted" rather than ``obtained" due to privacy concerns, and 2) paid creation of such dialogues may differ from how crowdworkers behave in real-world settings. In this study, we tackle these issues when creating a large-scale open-domain persona dialogue corpus, where persona implies that the conversation is performed by several actors with a fixed persona and user-side workers from an unspecified crowd.

* Presented at HCOMP 2022 as Works-in-Progress

Via

Access Paper or Ask Questions