Picture for Benjamin Minixhofer

Benjamin Minixhofer

Retrofitting (Large) Language Models with Dynamic Tokenization

Add code
Nov 27, 2024
Viaarxiv icon

Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation

Add code
Jun 24, 2024
Viaarxiv icon

Zero-Shot Tokenizer Transfer

Add code
May 13, 2024
Figure 1 for Zero-Shot Tokenizer Transfer
Figure 2 for Zero-Shot Tokenizer Transfer
Figure 3 for Zero-Shot Tokenizer Transfer
Figure 4 for Zero-Shot Tokenizer Transfer
Viaarxiv icon

Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation

Add code
May 30, 2023
Figure 1 for Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation
Figure 2 for Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation
Figure 3 for Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation
Figure 4 for Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation
Viaarxiv icon

CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models

Add code
May 23, 2023
Viaarxiv icon

HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response

Add code
Oct 10, 2022
Figure 1 for HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response
Figure 2 for HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response
Figure 3 for HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response
Figure 4 for HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response
Viaarxiv icon

WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

Add code
Dec 13, 2021
Figure 1 for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models
Figure 2 for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models
Figure 3 for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models
Figure 4 for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models
Viaarxiv icon

Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning

Add code
May 08, 2021
Figure 1 for Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning
Figure 2 for Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning
Figure 3 for Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning
Figure 4 for Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning
Viaarxiv icon