Picture for Chris Tanner

Chris Tanner

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Add code
Mar 31, 2025
Viaarxiv icon

No Free Labels: Limitations of LLM-as-a-Judge Without Human Grounding

Add code
Mar 07, 2025
Viaarxiv icon

How Much is Enough? The Diminishing Returns of Tokenization Training Data

Add code
Feb 27, 2025
Viaarxiv icon

Are Language Model Logits Calibrated?

Add code
Oct 21, 2024
Viaarxiv icon

SEC-QA: A Systematic Evaluation Corpus for Financial QA

Add code
Jun 20, 2024
Viaarxiv icon

Greed is All You Need: An Evaluation of Tokenizer Inference Methods

Add code
Mar 02, 2024
Viaarxiv icon

Tokenization Is More Than Compression

Add code
Feb 28, 2024
Figure 1 for Tokenization Is More Than Compression
Figure 2 for Tokenization Is More Than Compression
Figure 3 for Tokenization Is More Than Compression
Figure 4 for Tokenization Is More Than Compression
Viaarxiv icon

DocFinQA: A Long-Context Financial Reasoning Dataset

Add code
Jan 12, 2024
Viaarxiv icon

BizBench: A Quantitative Reasoning Benchmark for Business and Finance

Add code
Nov 11, 2023
Viaarxiv icon

A Graphical Approach to Document Layout Analysis

Add code
Aug 03, 2023
Viaarxiv icon