Picture for Craig W. Schmidt

Craig W. Schmidt

How Much is Enough? The Diminishing Returns of Tokenization Training Data

Add code
Feb 27, 2025
Viaarxiv icon

Greed is All You Need: An Evaluation of Tokenizer Inference Methods

Add code
Mar 02, 2024
Viaarxiv icon

Tokenization Is More Than Compression

Add code
Feb 28, 2024
Figure 1 for Tokenization Is More Than Compression
Figure 2 for Tokenization Is More Than Compression
Figure 3 for Tokenization Is More Than Compression
Figure 4 for Tokenization Is More Than Compression
Viaarxiv icon

Improving a tf-idf weighted document vector embedding

Add code
Feb 26, 2019
Figure 1 for Improving a tf-idf weighted document vector embedding
Figure 2 for Improving a tf-idf weighted document vector embedding
Figure 3 for Improving a tf-idf weighted document vector embedding
Figure 4 for Improving a tf-idf weighted document vector embedding
Viaarxiv icon