Picture for Ari S. Morcos

Ari S. Morcos

Brevity is the soul of wit: Pruning long files for code generation

Add code
Jun 29, 2024
Figure 1 for Brevity is the soul of wit: Pruning long files for code generation
Figure 2 for Brevity is the soul of wit: Pruning long files for code generation
Figure 3 for Brevity is the soul of wit: Pruning long files for code generation
Figure 4 for Brevity is the soul of wit: Pruning long files for code generation
Viaarxiv icon

Effective pruning of web-scale datasets based on complexity of concept clusters

Add code
Jan 09, 2024
Viaarxiv icon

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

Add code
Dec 05, 2023
Viaarxiv icon

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

Add code
Aug 23, 2023
Figure 1 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 2 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 3 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 4 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Viaarxiv icon

PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning

Add code
Aug 08, 2023
Figure 1 for PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning
Figure 2 for PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning
Figure 3 for PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning
Figure 4 for PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning
Viaarxiv icon

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

Add code
Mar 22, 2023
Figure 1 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Figure 2 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Figure 3 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Figure 4 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Viaarxiv icon

Emergence of Maps in the Memories of Blind Navigation Agents

Add code
Jan 30, 2023
Viaarxiv icon

Beyond neural scaling laws: beating power law scaling via data pruning

Add code
Jun 29, 2022
Figure 1 for Beyond neural scaling laws: beating power law scaling via data pruning
Figure 2 for Beyond neural scaling laws: beating power law scaling via data pruning
Figure 3 for Beyond neural scaling laws: beating power law scaling via data pruning
Figure 4 for Beyond neural scaling laws: beating power law scaling via data pruning
Viaarxiv icon

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Add code
Mar 10, 2022
Figure 1 for Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Figure 2 for Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Figure 3 for Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Figure 4 for Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Viaarxiv icon

Grounding inductive biases in natural images:invariance stems from variations in data

Add code
Jun 09, 2021
Figure 1 for Grounding inductive biases in natural images:invariance stems from variations in data
Figure 2 for Grounding inductive biases in natural images:invariance stems from variations in data
Figure 3 for Grounding inductive biases in natural images:invariance stems from variations in data
Figure 4 for Grounding inductive biases in natural images:invariance stems from variations in data
Viaarxiv icon