Picture for Mohammad Shoeybi

Mohammad Shoeybi

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning

Add code
Apr 15, 2025
Viaarxiv icon

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Add code
Apr 10, 2025
Viaarxiv icon

From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models

Add code
Apr 08, 2025
Viaarxiv icon

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning

Add code
Apr 06, 2025
Viaarxiv icon

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Add code
Dec 19, 2024
Figure 1 for AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Figure 2 for AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Figure 3 for AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Figure 4 for AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Viaarxiv icon

Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining

Add code
Dec 18, 2024
Figure 1 for Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
Figure 2 for Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
Figure 3 for Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
Figure 4 for Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
Viaarxiv icon

Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset

Add code
Dec 03, 2024
Viaarxiv icon

MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs

Add code
Nov 04, 2024
Viaarxiv icon

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs

Add code
Oct 15, 2024
Figure 1 for MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Figure 2 for MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Figure 3 for MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Figure 4 for MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Viaarxiv icon

Upcycling Large Language Models into Mixture of Experts

Add code
Oct 10, 2024
Figure 1 for Upcycling Large Language Models into Mixture of Experts
Figure 2 for Upcycling Large Language Models into Mixture of Experts
Figure 3 for Upcycling Large Language Models into Mixture of Experts
Figure 4 for Upcycling Large Language Models into Mixture of Experts
Viaarxiv icon