Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xavier Holt

Improving the Accuracy and Efficiency of Legal Document Tagging with Large Language Models and Instruction Prompts

Apr 12, 2025

Emily Johnson, Xavier Holt, Noah Wilson

Abstract:Legal multi-label classification is a critical task for organizing and accessing the vast amount of legal documentation. Despite its importance, it faces challenges such as the complexity of legal language, intricate label dependencies, and significant label imbalance. In this paper, we propose Legal-LLM, a novel approach that leverages the instruction-following capabilities of Large Language Models (LLMs) through fine-tuning. We reframe the multi-label classification task as a structured generation problem, instructing the LLM to directly output the relevant legal categories for a given document. We evaluate our method on two benchmark datasets, POSTURE50K and EURLEX57K, using micro-F1 and macro-F1 scores. Our experimental results demonstrate that Legal-LLM outperforms a range of strong baseline models, including traditional methods and other Transformer-based approaches. Furthermore, ablation studies and human evaluations validate the effectiveness of our approach, particularly in handling label imbalance and generating relevant and accurate legal labels.

Via

Access Paper or Ask Questions

Probabilistic Models of Relational Implication

Jul 28, 2019

Xavier Holt

Figure 1 for Probabilistic Models of Relational Implication

Figure 2 for Probabilistic Models of Relational Implication

Figure 3 for Probabilistic Models of Relational Implication

Figure 4 for Probabilistic Models of Relational Implication

Abstract:Relational data in its most basic form is a static collection of known facts. However, by learning to infer and deduct additional information and structure, we can massively increase the usefulness of the underlying data. One common form of inferential reasoning in knowledge bases is implication discovery. Here, by learning when one relation implies another, we can extend our knowledge representation. There are several existing models for relational implication, however we argue they are motivated but not principled. To this end, we define a formal probabilistic model of relational implication. By using estimators based on the empirical distribution of our dataset, we demonstrate that our model outperforms existing approaches. While previous work achieves a best score of 0.7812 AUC on an evaluatory dataset, our ProbE model improves this to 0.7915. Furthermore, we demonstrate that our model can be improved substantially through the use of link prediction models and dense latent representations of the underlying argument and relations. This variant, denoted ProbL, improves the state of the art on our evaluation dataset to 0.8143. In addition to developing a new framework and providing novel scores of relational implication, we provide two pragmatic resources to assist future research. First, we motivate and develop an improved crowd framework for constructing labelled datasets of relational implication. Using this, we reannotate and make public a dataset comprised of 17,848 instances of labelled relational implication. We demonstrate that precision (as evaluated by expert consensus with the crowd labels) on the resulting dataset improves from 53% to 95%.

Via

Access Paper or Ask Questions

Presenting a New Dataset for the Timeline Generation Problem

Nov 07, 2016

Xavier Holt, Will Radford, Ben Hachey

Figure 1 for Presenting a New Dataset for the Timeline Generation Problem

Figure 2 for Presenting a New Dataset for the Timeline Generation Problem

Figure 3 for Presenting a New Dataset for the Timeline Generation Problem

Figure 4 for Presenting a New Dataset for the Timeline Generation Problem

Abstract:The timeline generation task summarises an entity's biography by selecting stories representing key events from a large pool of relevant documents. This paper addresses the lack of a standard dataset and evaluative methodology for the problem. We present and make publicly available a new dataset of 18,793 news articles covering 39 entities. For each entity, we provide a gold standard timeline and a set of entity-related articles. We propose ROUGE as an evaluation metric and validate our dataset by showing that top Google results outperform straw-man baselines.

Via

Access Paper or Ask Questions