Abstract:Large Language Models (LLMs) show promise as a writing aid for professionals performing legal analyses. However, LLMs can often hallucinate in this setting, in ways difficult to recognize by non-professionals and existing text evaluation metrics. In this work, we pose the question: when can machine-generated legal analysis be evaluated as acceptable? We introduce the neutral notion of gaps, as opposed to hallucinations in a strict erroneous sense, to refer to the difference between human-written and machine-generated legal analysis. Gaps do not always equate to invalid generation. Working with legal experts, we consider the CLERC generation task proposed in Hou et al. (2024b), leading to a taxonomy, a fine-grained detector for predicting gap categories, and an annotated dataset for automatic evaluation. Our best detector achieves 67% F1 score and 80% precision on the test set. Employing this detector as an automated metric on legal analysis generated by SOTA LLMs, we find around 80% contain hallucinations of different kinds.
Abstract:In recent years, massive language models consisting exclusively of transformer decoders, led by the GPT-x family, have become increasingly popular. While studies have examined the behavior of these models, they tend to only focus on the output of the language model, avoiding analyzing their internal states despite such analyses being popular tools used within BERTology to study transformer encoders. We present a collection of methods for analyzing GPT-2's hidden states, and use the model's navigation of garden path sentences as a case study to demonstrate the utility of studying this model's behavior beyond its output alone. To support this analysis, we introduce a novel dataset consisting of 3 different types of garden path sentences, along with scripts to manipulate them. We find that measuring Manhattan distances and cosine similarities between hidden states shows that GPT-2 navigates these sentences more intuitively than conventional methods that predict from the model's output alone.