Picture for Sewoong Oh

Sewoong Oh

Economics of Sourcing Human Data

Add code
Feb 11, 2025
Viaarxiv icon

Scalable Fingerprinting of Large Language Models

Add code
Feb 11, 2025
Viaarxiv icon

OML: Open, Monetizable, and Loyal AI

Add code
Nov 01, 2024
Viaarxiv icon

Randomization Techniques to Mitigate the Risk of Copyright Infringement

Add code
Aug 21, 2024
Figure 1 for Randomization Techniques to Mitigate the Risk of Copyright Infringement
Figure 2 for Randomization Techniques to Mitigate the Risk of Copyright Infringement
Figure 3 for Randomization Techniques to Mitigate the Risk of Copyright Infringement
Figure 4 for Randomization Techniques to Mitigate the Risk of Copyright Infringement
Viaarxiv icon

Better Alignment with Instruction Back-and-Forth Translation

Add code
Aug 08, 2024
Figure 1 for Better Alignment with Instruction Back-and-Forth Translation
Figure 2 for Better Alignment with Instruction Back-and-Forth Translation
Figure 3 for Better Alignment with Instruction Back-and-Forth Translation
Figure 4 for Better Alignment with Instruction Back-and-Forth Translation
Viaarxiv icon

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Add code
Jul 24, 2024
Figure 1 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 2 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 3 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 4 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Viaarxiv icon

Understanding the Gains from Repeated Self-Distillation

Add code
Jul 05, 2024
Figure 1 for Understanding the Gains from Repeated Self-Distillation
Figure 2 for Understanding the Gains from Repeated Self-Distillation
Figure 3 for Understanding the Gains from Repeated Self-Distillation
Figure 4 for Understanding the Gains from Repeated Self-Distillation
Viaarxiv icon

PLeaS -- Merging Models with Permutations and Least Squares

Add code
Jul 02, 2024
Viaarxiv icon

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Figure 1 for DataComp-LM: In search of the next generation of training sets for language models
Figure 2 for DataComp-LM: In search of the next generation of training sets for language models
Figure 3 for DataComp-LM: In search of the next generation of training sets for language models
Figure 4 for DataComp-LM: In search of the next generation of training sets for language models
Viaarxiv icon

Multilingual Diversity Improves Vision-Language Representations

Add code
May 27, 2024
Viaarxiv icon