Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Sep 17, 2023

Ian Arawjo, Chelse Swoopes, Priyan Vaithilingam, Martin Wattenberg, Elena Glassman

Figure 1 for ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Figure 2 for ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Figure 3 for ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Figure 4 for ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Share this with someone who'll enjoy it:

Abstract:Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.

* 23 pages, 7 figures, in submission

View paper on

Share this with someone who'll enjoy it:

Title:ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Paper and Code