Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gijs van Dijck

Can LLMs Create Legally Relevant Summaries and Analyses of Videos?

Nov 15, 2025

Lyra Hoeben-Kuil, Gijs van Dijck, Jaromir Savelka, Johanna Gunawan, Konrad Kollnig, Marta Kolacz, Mindy Duffourc, Shashank Chakravarthy, Hannes Westermann

Figure 1 for Can LLMs Create Legally Relevant Summaries and Analyses of Videos?

Figure 2 for Can LLMs Create Legally Relevant Summaries and Analyses of Videos?

Figure 3 for Can LLMs Create Legally Relevant Summaries and Analyses of Videos?

Figure 4 for Can LLMs Create Legally Relevant Summaries and Analyses of Videos?

Abstract:Understanding the legally relevant factual basis of an event and conveying it through text is a key skill of legal professionals. This skill is important for preparing forms (e.g., insurance claims) or other legal documents (e.g., court claims), but often presents a challenge for laypeople. Current AI approaches aim to bridge this gap, but mostly rely on the user to articulate what has happened in text, which may be challenging for many. Here, we investigate the capability of large language models (LLMs) to understand and summarize events occurring in videos. We ask an LLM to summarize and draft legal letters, based on 120 YouTube videos showing legal issues in various domains. Overall, 71.7\% of the summaries were rated as of high or medium quality, which is a promising result, opening the door to a number of applications in e.g. access to justice.

* Accepted for publication at JURIX 2025 Torino, Italy. This is the preprint version. Code and data available at: https://github.com/maastrichtlawtech/jurix2025_LLM_video_analysis

Via

Access Paper or Ask Questions

Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain

Sep 02, 2024

Antoine Louis, Gijs van Dijck, Gerasimos Spanakis

Figure 1 for Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain

Figure 2 for Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain

Figure 3 for Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain

Figure 4 for Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain

Abstract:Hybrid search has emerged as an effective strategy to offset the limitations of different matching paradigms, especially in out-of-domain contexts where notable improvements in retrieval quality have been observed. However, existing research predominantly focuses on a limited set of retrieval methods, evaluated in pairs on domain-general datasets exclusively in English. In this work, we study the efficacy of hybrid search across a variety of prominent retrieval models within the unexplored field of law in the French language, assessing both zero-shot and in-domain scenarios. Our findings reveal that in a zero-shot context, fusing different domain-general models consistently enhances performance compared to using a standalone model, regardless of the fusion method. Surprisingly, when models are trained in-domain, we find that fusion generally diminishes performance relative to using the best single system, unless fusing scores with carefully tuned weights. These novel insights, among others, expand the applicability of prior findings across a new field and language, and contribute to a deeper understanding of hybrid search in non-English specialized domains.

* Under review

Via

Access Paper or Ask Questions

ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval

Feb 23, 2024

Antoine Louis, Vageesh Saxena, Gijs van Dijck, Gerasimos Spanakis

Abstract:State-of-the-art neural retrievers predominantly focus on high-resource languages like English, which impedes their adoption in retrieval scenarios involving other languages. Current approaches circumvent the lack of high-quality labeled data in non-English languages by leveraging multilingual pretrained language models capable of cross-lingual transfer. However, these models require substantial task-specific fine-tuning across multiple languages, often perform poorly in languages with minimal representation in the pretraining corpus, and struggle to incorporate new languages after the pretraining phase. In this work, we present a novel modular dense retrieval model that learns from the rich data of a single high-resource language and effectively zero-shot transfers to a wide array of languages, thereby eliminating the need for language-specific labeled data. Our model, ColBERT-XM, demonstrates competitive performance against existing state-of-the-art multilingual retrievers trained on more extensive datasets in various languages. Further analysis reveals that our modular approach is highly data-efficient, effectively adapts to out-of-distribution data, and significantly reduces energy consumption and carbon emissions. By demonstrating its proficiency in zero-shot scenarios, ColBERT-XM marks a shift towards more sustainable and inclusive retrieval systems, enabling effective information accessibility in numerous languages. We publicly release our code and models for the community.

* Under review. Code is available at https://github.com/ant-louis/xm-retrievers

Via

Access Paper or Ask Questions

Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models

Sep 29, 2023

Antoine Louis, Gijs van Dijck, Gerasimos Spanakis

Abstract:Many individuals are likely to face a legal dispute at some point in their lives, but their lack of understanding of how to navigate these complex issues often renders them vulnerable. The advancement of natural language processing opens new avenues for bridging this legal literacy gap through the development of automated legal aid systems. However, existing legal question answering (LQA) approaches often suffer from a narrow scope, being either confined to specific legal domains or limited to brief, uninformative responses. In this work, we propose an end-to-end methodology designed to generate long-form answers to any statutory law questions, utilizing a "retrieve-then-read" pipeline. To support this approach, we introduce and release the Long-form Legal Question Answering (LLeQA) dataset, comprising 1,868 expert-annotated legal questions in the French language, complete with detailed answers rooted in pertinent legal provisions. Our experimental results demonstrate promising performance on automatic evaluation metrics, but a qualitative analysis uncovers areas for refinement. As one of the only comprehensive, expert-annotated long-form LQA dataset, LLeQA has the potential to not only accelerate research towards resolving a significant real-world issue, but also act as a rigorous benchmark for evaluating NLP models in specialized domains. We publicly release our code, data, and models.

* Under review. Code is available at https://github.com/maastrichtlawtech/lleqa

Via

Access Paper or Ask Questions

Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural Networks

Jan 30, 2023

Antoine Louis, Gijs van Dijck, Gerasimos Spanakis

Figure 1 for Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural Networks

Figure 2 for Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural Networks

Figure 3 for Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural Networks

Figure 4 for Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural Networks

Abstract:Statutory article retrieval (SAR), the task of retrieving statute law articles relevant to a legal question, is a promising application of legal text processing. In particular, high-quality SAR systems can improve the work efficiency of legal professionals and provide basic legal assistance to citizens in need at no cost. Unlike traditional ad-hoc information retrieval, where each document is considered a complete source of information, SAR deals with texts whose full sense depends on complementary information from the topological organization of statute law. While existing works ignore these domain-specific dependencies, we propose a novel graph-augmented dense statute retriever (G-DSR) model that incorporates the structure of legislation via a graph neural network to improve dense retrieval performance. Experimental results show that our approach outperforms strong retrieval baselines on a real-world expert-annotated SAR dataset.

* EACL 2023. Code is available at https://github.com/maastrichtlawtech/gdsr

Via

Access Paper or Ask Questions