Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jacob Watson

Are Frontier Large Language Models Suitable for Q&A in Science Centres?

Dec 06, 2024

Jacob Watson, Fabrício Góes, Marco Volpe, Talles Medeiros

Figure 1 for Are Frontier Large Language Models Suitable for Q&A in Science Centres?

Figure 2 for Are Frontier Large Language Models Suitable for Q&A in Science Centres?

Figure 3 for Are Frontier Large Language Models Suitable for Q&A in Science Centres?

Figure 4 for Are Frontier Large Language Models Suitable for Q&A in Science Centres?

Abstract:This paper investigates the suitability of frontier Large Language Models (LLMs) for Q&A interactions in science centres, with the aim of boosting visitor engagement while maintaining factual accuracy. Using a dataset of questions collected from the National Space Centre in Leicester (UK), we evaluated responses generated by three leading models: OpenAI's GPT-4, Claude 3.5 Sonnet, and Google Gemini 1.5. Each model was prompted for both standard and creative responses tailored to an 8-year-old audience, and these responses were assessed by space science experts based on accuracy, engagement, clarity, novelty, and deviation from expected answers. The results revealed a trade-off between creativity and accuracy, with Claude outperforming GPT and Gemini in both maintaining clarity and engaging young audiences, even when asked to generate more creative responses. Nonetheless, experts observed that higher novelty was generally associated with reduced factual reliability across all models. This study highlights the potential of LLMs in educational settings, emphasizing the need for careful prompt engineering to balance engagement with scientific rigor.

* 19 pages, 2 figures, 10 tables

Via

Access Paper or Ask Questions