Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Emad Elwany

DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc

Aug 17, 2022

Emad Elwany, Allison Hegel, Marina Shah, Brendan Roof, Genevieve Peaslee, Quentin Rivet

Figure 1 for DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc

Figure 2 for DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc

Figure 3 for DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc

Figure 4 for DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc

Abstract:Weak supervision has been applied to various Natural Language Understanding tasks in recent years. Due to technical challenges with scaling weak supervision to work on long-form documents, spanning up to hundreds of pages, applications in the document understanding space have been limited. At Lexion, we built a weak supervision-based system tailored for long-form (10-200 pages long) PDF documents. We use this platform for building dozens of language understanding models and have applied it successfully to various domains, from commercial agreements to corporate formation documents. In this paper, we demonstrate the effectiveness of supervised learning with weak supervision in a situation with limited time, workforce, and training data. We built 8 high quality machine learning models in the span of one week, with the help of a small team of just 3 annotators working with a dataset of under 300 documents. We share some details about our overall architecture, how we utilize weak supervision, and what results we are able to achieve. We also include the dataset for researchers who would like to experiment with alternate approaches or refine ours. Furthermore, we shed some light on the additional complexities that arise when working with poorly scanned long-form documents in PDF format, and some of the techniques that help us achieve state-of-the-art performance on such data.

Via

Access Paper or Ask Questions

The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

Jul 16, 2021

Allison Hegel, Marina Shah, Genevieve Peaslee, Brendan Roof, Emad Elwany

Figure 1 for The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

Figure 2 for The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

Figure 3 for The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

Figure 4 for The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

Abstract:Large, pre-trained transformer models like BERT have achieved state-of-the-art results on document understanding tasks, but most implementations can only consider 512 tokens at a time. For many real-world applications, documents can be much longer, and the segmentation strategies typically used on longer documents miss out on document structure and contextual information, hurting their results on downstream tasks. In our work on legal agreements, we find that visual cues such as layout, style, and placement of text in a document are strong features that are crucial to achieving an acceptable level of accuracy on long documents. We measure the impact of incorporating such visual cues, obtained via computer vision methods, on the accuracy of document understanding tasks including document segmentation, entity extraction, and attribute classification. Our method of segmenting documents based on structural metadata out-performs existing methods on four long-document understanding tasks as measured on the Contract Understanding Atticus Dataset.

* Document Intelligence Workshop at KDD, 2021

Via

Access Paper or Ask Questions

BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding

Nov 01, 2019

Emad Elwany, Dave Moore, Gaurav Oberoi

Figure 1 for BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding

Figure 2 for BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding

Figure 3 for BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding

Figure 4 for BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding

Abstract:Fine-tuning language models, such as BERT, on domain specific corpora has proven to be valuable in domains like scientific papers and biomedical text. In this paper, we show that fine-tuning BERT on legal documents similarly provides valuable improvements on NLP tasks in the legal domain. Demonstrating this outcome is significant for analyzing commercial agreements, because obtaining large legal corpora is challenging due to their confidential nature. As such, we show that having access to large legal corpora is a competitive advantage for commercial applications, and academic research on analyzing contracts.

Via

Access Paper or Ask Questions

Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

Mar 24, 2017

Justin Cranshaw, Emad Elwany, Todd Newman, Rafal Kocielnik, Bowen Yu, Sandeep Soni, Jaime Teevan, Andrés Monroy-Hernández

Figure 1 for Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

Figure 2 for Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

Figure 3 for Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

Figure 4 for Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

Abstract:Although information workers may complain about meetings, they are an essential part of their work life. Consequently, busy people spend a significant amount of time scheduling meetings. We present Calendar.help, a system that provides fast, efficient scheduling through structured workflows. Users interact with the system via email, delegating their scheduling needs to the system as if it were a human personal assistant. Common scheduling scenarios are broken down using well-defined workflows and completed as a series of microtasks that are automated when possible and executed by a human otherwise. Unusual scenarios fall back to a trained human assistant who executes them as unstructured macrotasks. We describe the iterative approach we used to develop Calendar.help, and share the lessons learned from scheduling thousands of meetings during a year of real-world deployments. Our findings provide insight into how complex information tasks can be broken down into repeatable components that can be executed efficiently to improve productivity.

* 10 pages

Via

Access Paper or Ask Questions