Picture for Marco Cognetta

Marco Cognetta

Tokenization as Finite-State Transduction

Add code
Oct 21, 2024
Figure 1 for Tokenization as Finite-State Transduction
Figure 2 for Tokenization as Finite-State Transduction
Figure 3 for Tokenization as Finite-State Transduction
Figure 4 for Tokenization as Finite-State Transduction
Viaarxiv icon

Distributional Properties of Subword Regularization

Add code
Aug 21, 2024
Viaarxiv icon

An Analysis of BPE Vocabulary Trimming in Neural Machine Translation

Add code
Mar 30, 2024
Viaarxiv icon

Two Counterexamples to Tokenization and the Noiseless Channel

Add code
Feb 29, 2024
Viaarxiv icon