Picture for Jonathan Hayase

Jonathan Hayase

Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations

Add code
Jun 23, 2025
Viaarxiv icon

Sampling from Your Language Model One Byte at a Time

Add code
Jun 17, 2025
Viaarxiv icon

SuperBPE: Space Travel for Language Models

Add code
Mar 17, 2025
Viaarxiv icon

Scalable Fingerprinting of Large Language Models

Add code
Feb 11, 2025
Viaarxiv icon

OML: Open, Monetizable, and Loyal AI

Add code
Nov 01, 2024
Viaarxiv icon

Monge-Kantorovich Fitting With Sobolev Budgets

Add code
Sep 25, 2024
Viaarxiv icon

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Add code
Jul 24, 2024
Figure 1 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 2 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 3 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 4 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Viaarxiv icon

PLeaS -- Merging Models with Permutations and Least Squares

Add code
Jul 02, 2024
Viaarxiv icon

Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares

Add code
Apr 23, 2024
Figure 1 for Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares
Figure 2 for Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares
Figure 3 for Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares
Viaarxiv icon

Query-Based Adversarial Prompt Generation

Add code
Feb 19, 2024
Figure 1 for Query-Based Adversarial Prompt Generation
Figure 2 for Query-Based Adversarial Prompt Generation
Figure 3 for Query-Based Adversarial Prompt Generation
Figure 4 for Query-Based Adversarial Prompt Generation
Viaarxiv icon