Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jakub Harašta

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Dec 15, 2021

Jaromir Savelka, Hannes Westermann, Karim Benyekhlef, Charlotte S. Alexander, Jayla C. Grant, David Restrepo Amariles, Rajaa El Hamdani, Sébastien Meeùs, Michał Araszkiewicz, Kevin D. Ashley(+8 more)

Figure 1 for Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Figure 2 for Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Figure 3 for Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Figure 4 for Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Abstract:In this paper, we examine the use of multi-lingual sentence embeddings to transfer predictive models for functional segmentation of adjudicatory decisions across jurisdictions, legal systems (common and civil law), languages, and domains (i.e. contexts). Mechanisms for utilizing linguistic resources outside of their original context have significant potential benefits in AI & Law because differences between legal systems, languages, or traditions often block wider adoption of research outcomes. We analyze the use of Language-Agnostic Sentence Representations in sequence labeling models using Gated Recurrent Units (GRUs) that are transferable across languages. To investigate transfer between different contexts we developed an annotation scheme for functional segmentation of adjudicatory decisions. We found that models generalize beyond the contexts on which they were trained (e.g., a model trained on administrative decisions from the US can be applied to criminal law decisions from Italy). Further, we found that training the models on multiple contexts increases robustness and improves overall performance when evaluating on previously unseen contexts. Finally, we found that pooling the training data from all the contexts enhances the models' in-context performance.

* In Proceedings of ICAIL 2021, pp. 129-138. 2021
* 10 pages

Via

Access Paper or Ask Questions

Citation Data of Czech Apex Courts

Feb 06, 2020

Jakub Harašta, Tereza Novotná, Jaromír Šavelka

Figure 1 for Citation Data of Czech Apex Courts

Figure 2 for Citation Data of Czech Apex Courts

Figure 3 for Citation Data of Czech Apex Courts

Figure 4 for Citation Data of Czech Apex Courts

Abstract:In this paper, we introduce the citation data of the Czech apex courts (Supreme Court, Supreme Administrative Court and Constitutional Court). This dataset was automatically extracted from the corpus of texts of Czech court decisions - CzCDC 1.0. We obtained the citation data by building the natural language processing pipeline for extraction of the court decision identifiers. The pipeline included the (i) document segmentation model and the (ii) reference recognition model. Furthermore, the dataset was manually processed to achieve high-quality citation data as a base for subsequent qualitative and quantitative analyses. The dataset will be made available to the general public.

Via

Access Paper or Ask Questions

The Czech Court Decisions Corpus (CzCDC): Availability as the First Step

Oct 21, 2019

Tereza Novotná, Jakub Harašta

Abstract:In this paper, we describe the Czech Court Decision Corpus (CzCDC). CzCDC is a dataset of 237,723 decisions published by the Czech apex (or top-tier) courts, namely the Supreme Court, the Supreme Administrative Court and the Constitutional Court. All the decisions were published between 1st January 1993 and 30th September 2018. Court decisions are available on the webpages of the respective courts or via commercial databases of legal information. This often leads researchers interested in these decisions to reach either to respective court or to commercial provider. This leads to delays and additional costs. These are further exacerbated by a lack of inter-court standard in the terms of the data format in which courts provide their decisions. Additionally, courts' databases often lack proper documentation. Our goal is to make the dataset of court decisions freely available online in consistent (plain) format to lower the cost associated with obtaining data for future research. We believe that simplified access to court decisions through the CzCDC could benefit other researchers. In this paper, we describe the processing of decisions before their inclusion into CzCDC and basic statistics of the dataset. This dataset contains plain texts of court decisions and these texts are not annotated for any grammatical or syntactical features.

Via

Access Paper or Ask Questions