Picture for Ankush Garg

Ankush Garg

An Empirical Study on Noisy Data and LLM Pretraining Loss Divergence

Add code
Feb 02, 2026
Viaarxiv icon

Training LLMs with Fault Tolerant HSDP on 100,000 GPUs

Add code
Jan 30, 2026
Viaarxiv icon

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Add code
Dec 11, 2023
Viaarxiv icon

The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation

Add code
Aug 14, 2023
Figure 1 for The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Figure 2 for The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Figure 3 for The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Figure 4 for The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Viaarxiv icon

Benchmarking Neural Network Training Algorithms

Add code
Jun 12, 2023
Figure 1 for Benchmarking Neural Network Training Algorithms
Figure 2 for Benchmarking Neural Network Training Algorithms
Figure 3 for Benchmarking Neural Network Training Algorithms
Figure 4 for Benchmarking Neural Network Training Algorithms
Viaarxiv icon

Binarized Neural Machine Translation

Add code
Feb 09, 2023
Figure 1 for Binarized Neural Machine Translation
Figure 2 for Binarized Neural Machine Translation
Figure 3 for Binarized Neural Machine Translation
Figure 4 for Binarized Neural Machine Translation
Viaarxiv icon

Do Current Multi-Task Optimization Methods in Deep Learning Even Help?

Add code
Sep 23, 2022
Figure 1 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Figure 2 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Figure 3 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Figure 4 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Viaarxiv icon