Picture for Mostofa Patwary

Mostofa Patwary

Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset

Add code
Dec 03, 2024
Viaarxiv icon

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs

Add code
Oct 15, 2024
Figure 1 for MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Figure 2 for MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Figure 3 for MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Figure 4 for MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Viaarxiv icon

LLM Pruning and Distillation in Practice: The Minitron Approach

Add code
Aug 21, 2024
Figure 1 for LLM Pruning and Distillation in Practice: The Minitron Approach
Figure 2 for LLM Pruning and Distillation in Practice: The Minitron Approach
Figure 3 for LLM Pruning and Distillation in Practice: The Minitron Approach
Figure 4 for LLM Pruning and Distillation in Practice: The Minitron Approach
Viaarxiv icon

Compact Language Models via Pruning and Knowledge Distillation

Add code
Jul 19, 2024
Figure 1 for Compact Language Models via Pruning and Knowledge Distillation
Figure 2 for Compact Language Models via Pruning and Knowledge Distillation
Figure 3 for Compact Language Models via Pruning and Knowledge Distillation
Figure 4 for Compact Language Models via Pruning and Knowledge Distillation
Viaarxiv icon

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

Add code
Jul 09, 2024
Viaarxiv icon

Data, Data Everywhere: A Guide for Pretraining Dataset Construction

Add code
Jul 08, 2024
Viaarxiv icon

Nemotron-4 340B Technical Report

Add code
Jun 17, 2024
Figure 1 for Nemotron-4 340B Technical Report
Figure 2 for Nemotron-4 340B Technical Report
Figure 3 for Nemotron-4 340B Technical Report
Figure 4 for Nemotron-4 340B Technical Report
Viaarxiv icon

StarCoder 2 and The Stack v2: The Next Generation

Add code
Feb 29, 2024
Figure 1 for StarCoder 2 and The Stack v2: The Next Generation
Figure 2 for StarCoder 2 and The Stack v2: The Next Generation
Figure 3 for StarCoder 2 and The Stack v2: The Next Generation
Figure 4 for StarCoder 2 and The Stack v2: The Next Generation
Viaarxiv icon

Nemotron-4 15B Technical Report

Add code
Feb 27, 2024
Viaarxiv icon

Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Add code
Feb 14, 2023
Viaarxiv icon