Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alison Smith-Renner

Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation through the Lens of News Headline Generation

Oct 18, 2023

Zijian Ding, Alison Smith-Renner, Wenjuan Zhang, Joel R. Tetreault, Alejandro Jaimes

Abstract:To explore how humans can best leverage LLMs for writing and how interacting with these models affects feelings of ownership and trust in the writing process, we compared common human-AI interaction types (e.g., guiding system, selecting from system outputs, post-editing outputs) in the context of LLM-assisted news headline generation. While LLMs alone can generate satisfactory news headlines, on average, human control is needed to fix undesirable model outputs. Of the interaction methods, guiding and selecting model output added the most benefit with the lowest cost (in time and effort). Further, AI assistance did not harm participants' perception of control compared to freeform editing.

Via

Access Paper or Ask Questions

An Exploration of Post-Editing Effectiveness in Text Summarization

Jun 13, 2022

Vivian Lai, Alison Smith-Renner, Ke Zhang, Ruijia Cheng, Wenjuan Zhang, Joel Tetreault, Alejandro Jaimes

Figure 1 for An Exploration of Post-Editing Effectiveness in Text Summarization

Figure 2 for An Exploration of Post-Editing Effectiveness in Text Summarization

Figure 3 for An Exploration of Post-Editing Effectiveness in Text Summarization

Figure 4 for An Exploration of Post-Editing Effectiveness in Text Summarization

Abstract:Automatic summarization methods are efficient but can suffer from low quality. In comparison, manual summarization is expensive but produces higher quality. Can humans and AI collaborate to improve summarization performance? In similar text generation tasks (e.g., machine translation), human-AI collaboration in the form of "post-editing" AI-generated text reduces human workload and improves the quality of AI output. Therefore, we explored whether post-editing offers advantages in text summarization. Specifically, we conducted an experiment with 72 participants, comparing post-editing provided summaries with manual summarization for summary quality, human efficiency, and user experience on formal (XSum news) and informal (Reddit posts) text. This study sheds valuable insights on when post-editing is useful for text summarization: it helped in some cases (e.g., when participants lacked domain knowledge) but not in others (e.g., when provided summaries include inaccurate information). Participants' different editing strategies and needs for assistance offer implications for future human-AI summarization systems.

* 18 pages, 21 figures

Via

Access Paper or Ask Questions

Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies

Dec 21, 2021

Vivian Lai, Chacha Chen, Q. Vera Liao, Alison Smith-Renner, Chenhao Tan

Figure 1 for Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies

Figure 2 for Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies

Figure 3 for Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies

Figure 4 for Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies

Abstract:As AI systems demonstrate increasingly strong predictive performance, their adoption has grown in numerous domains. However, in high-stakes domains such as criminal justice and healthcare, full automation is often not desirable due to safety, ethical, and legal concerns, yet fully manual approaches can be inaccurate and time consuming. As a result, there is growing interest in the research community to augment human decision making with AI assistance. Besides developing AI technologies for this purpose, the emerging field of human-AI decision making must embrace empirical approaches to form a foundational understanding of how humans interact and work with AI to make decisions. To invite and help structure research efforts towards a science of understanding and improving human-AI decision making, we survey recent literature of empirical human-subject studies on this topic. We summarize the study design choices made in over 100 papers in three important aspects: (1) decision tasks, (2) AI models and AI assistance elements, and (3) evaluation metrics. For each aspect, we summarize current trends, discuss gaps in current practices of the field, and make a list of recommendations for future research. Our survey highlights the need to develop common frameworks to account for the design and research spaces of human-AI decision making, so that researchers can make rigorous choices in study design, and the research community can build on each other's work and produce generalizable scientific knowledge. We also hope this survey will serve as a bridge for HCI and AI communities to work together to mutually shape the empirical science and computational technologies for human-AI decision making.

* 36 pages, 2 figures, see https://haidecisionmaking.github.io for website

Via

Access Paper or Ask Questions

Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models

Jun 04, 2019

Varun Kumar, Alison Smith-Renner, Leah Findlater, Kevin Seppi, Jordan Boyd-Graber

Figure 1 for Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models

Figure 2 for Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models

Figure 3 for Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models

Figure 4 for Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models

Abstract:To address the lack of comparative evaluation of Human-in-the-Loop Topic Modeling (HLTM) systems, we implement and evaluate three contrasting HLTM modeling approaches using simulation experiments. These approaches extend previously proposed frameworks, including constraints and informed prior-based methods. Users should have a sense of control in HLTM systems, so we propose a control metric to measure whether refinement operations' results match users' expectations. Informed prior-based methods provide better control than constraints, but constraints yield higher quality topics.

* In proceedings of ACL 2019

Via

Access Paper or Ask Questions