Picture for Sewoong Oh

Sewoong Oh

OML: Open, Monetizable, and Loyal AI

Add code
Nov 01, 2024
Viaarxiv icon

Randomization Techniques to Mitigate the Risk of Copyright Infringement

Add code
Aug 21, 2024
Figure 1 for Randomization Techniques to Mitigate the Risk of Copyright Infringement
Figure 2 for Randomization Techniques to Mitigate the Risk of Copyright Infringement
Figure 3 for Randomization Techniques to Mitigate the Risk of Copyright Infringement
Figure 4 for Randomization Techniques to Mitigate the Risk of Copyright Infringement
Viaarxiv icon

Better Alignment with Instruction Back-and-Forth Translation

Add code
Aug 08, 2024
Figure 1 for Better Alignment with Instruction Back-and-Forth Translation
Figure 2 for Better Alignment with Instruction Back-and-Forth Translation
Figure 3 for Better Alignment with Instruction Back-and-Forth Translation
Figure 4 for Better Alignment with Instruction Back-and-Forth Translation
Viaarxiv icon

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Add code
Jul 24, 2024
Figure 1 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 2 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 3 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 4 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Viaarxiv icon

Understanding the Gains from Repeated Self-Distillation

Add code
Jul 05, 2024
Figure 1 for Understanding the Gains from Repeated Self-Distillation
Figure 2 for Understanding the Gains from Repeated Self-Distillation
Figure 3 for Understanding the Gains from Repeated Self-Distillation
Figure 4 for Understanding the Gains from Repeated Self-Distillation
Viaarxiv icon

PLeaS -- Merging Models with Permutations and Least Squares

Add code
Jul 02, 2024
Viaarxiv icon

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Figure 1 for DataComp-LM: In search of the next generation of training sets for language models
Figure 2 for DataComp-LM: In search of the next generation of training sets for language models
Figure 3 for DataComp-LM: In search of the next generation of training sets for language models
Figure 4 for DataComp-LM: In search of the next generation of training sets for language models
Viaarxiv icon

Multilingual Diversity Improves Vision-Language Representations

Add code
May 27, 2024
Viaarxiv icon

Air Gap: Protecting Privacy-Conscious Conversational Agents

Add code
May 08, 2024
Viaarxiv icon

Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy

Add code
May 02, 2024
Figure 1 for Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy
Figure 2 for Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy
Figure 3 for Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy
Figure 4 for Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy
Viaarxiv icon