Picture for Yuval Pinter

Yuval Pinter

Ben-Gurion University of the Negev

Splintering Nonconcatenative Languages for Better Tokenization

Add code
Mar 18, 2025
Viaarxiv icon

Token-Level Privacy in Large Language Models

Add code
Mar 05, 2025
Viaarxiv icon

How Much is Enough? The Diminishing Returns of Tokenization Training Data

Add code
Feb 27, 2025
Viaarxiv icon

Information Types in Product Reviews

Add code
Feb 20, 2025
Viaarxiv icon

Don't Touch My Diacritics

Add code
Oct 31, 2024
Viaarxiv icon

Protecting Privacy in Classifiers by Token Manipulation

Add code
Jul 01, 2024
Viaarxiv icon

Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge

Add code
Apr 20, 2024
Figure 1 for Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Figure 2 for Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Figure 3 for Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Figure 4 for Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Viaarxiv icon

An Analysis of BPE Vocabulary Trimming in Neural Machine Translation

Add code
Mar 30, 2024
Viaarxiv icon

BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine Translation

Add code
Mar 06, 2024
Viaarxiv icon

Greed is All You Need: An Evaluation of Tokenizer Inference Methods

Add code
Mar 02, 2024
Viaarxiv icon