Picture for Zejiang Shen

Zejiang Shen

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Add code
Jan 31, 2024
Figure 1 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 2 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 3 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 4 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Viaarxiv icon

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

Add code
Aug 24, 2023
Figure 1 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Figure 2 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Figure 3 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Figure 4 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Viaarxiv icon

Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

Add code
Jun 01, 2023
Viaarxiv icon

Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks

Add code
Apr 05, 2023
Viaarxiv icon

The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

Add code
Mar 25, 2023
Figure 1 for The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Figure 2 for The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Figure 3 for The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Figure 4 for The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Viaarxiv icon

The Semantic Scholar Open Data Platform

Add code
Jan 24, 2023
Viaarxiv icon

Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities

Add code
Jun 23, 2022
Figure 1 for Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
Figure 2 for Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
Figure 3 for Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
Figure 4 for Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
Viaarxiv icon

Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search

Add code
Mar 16, 2022
Figure 1 for Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
Figure 2 for Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
Figure 3 for Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
Figure 4 for Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
Viaarxiv icon

Incorporating Visual Layout Structures for Scientific Text Classification

Add code
Jun 21, 2021
Figure 1 for Incorporating Visual Layout Structures for Scientific Text Classification
Figure 2 for Incorporating Visual Layout Structures for Scientific Text Classification
Figure 3 for Incorporating Visual Layout Structures for Scientific Text Classification
Figure 4 for Incorporating Visual Layout Structures for Scientific Text Classification
Viaarxiv icon

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

Add code
Mar 29, 2021
Figure 1 for LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis
Figure 2 for LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis
Figure 3 for LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis
Figure 4 for LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis
Viaarxiv icon