Picture for Chris Tanner

Chris Tanner

How Much is Enough? The Diminishing Returns of Tokenization Training Data

Add code
Feb 27, 2025
Viaarxiv icon

Are Language Model Logits Calibrated?

Add code
Oct 21, 2024
Viaarxiv icon

SEC-QA: A Systematic Evaluation Corpus for Financial QA

Add code
Jun 20, 2024
Viaarxiv icon

Greed is All You Need: An Evaluation of Tokenizer Inference Methods

Add code
Mar 02, 2024
Viaarxiv icon

Tokenization Is More Than Compression

Add code
Feb 28, 2024
Figure 1 for Tokenization Is More Than Compression
Figure 2 for Tokenization Is More Than Compression
Figure 3 for Tokenization Is More Than Compression
Figure 4 for Tokenization Is More Than Compression
Viaarxiv icon

DocFinQA: A Long-Context Financial Reasoning Dataset

Add code
Jan 12, 2024
Viaarxiv icon

BizBench: A Quantitative Reasoning Benchmark for Business and Finance

Add code
Nov 11, 2023
Viaarxiv icon

A Graphical Approach to Document Layout Analysis

Add code
Aug 03, 2023
Viaarxiv icon

What happens before and after: Multi-Event Commonsense in Event Coreference Resolution

Add code
Feb 21, 2023
Viaarxiv icon

LineCap: Line Charts for Data Visualization Captioning Models

Add code
Jul 15, 2022
Figure 1 for LineCap: Line Charts for Data Visualization Captioning Models
Figure 2 for LineCap: Line Charts for Data Visualization Captioning Models
Figure 3 for LineCap: Line Charts for Data Visualization Captioning Models
Figure 4 for LineCap: Line Charts for Data Visualization Captioning Models
Viaarxiv icon