Picture for Jared Casper

Jared Casper

Nemotron-4 340B Technical Report

Add code
Jun 17, 2024
Figure 1 for Nemotron-4 340B Technical Report
Figure 2 for Nemotron-4 340B Technical Report
Figure 3 for Nemotron-4 340B Technical Report
Figure 4 for Nemotron-4 340B Technical Report
Viaarxiv icon

An Empirical Study of Mamba-based Language Models

Add code
Jun 12, 2024
Figure 1 for An Empirical Study of Mamba-based Language Models
Figure 2 for An Empirical Study of Mamba-based Language Models
Figure 3 for An Empirical Study of Mamba-based Language Models
Figure 4 for An Empirical Study of Mamba-based Language Models
Viaarxiv icon

Nemotron-4 15B Technical Report

Add code
Feb 27, 2024
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

Reducing Activation Recomputation in Large Transformer Models

Add code
May 10, 2022
Figure 1 for Reducing Activation Recomputation in Large Transformer Models
Figure 2 for Reducing Activation Recomputation in Large Transformer Models
Figure 3 for Reducing Activation Recomputation in Large Transformer Models
Figure 4 for Reducing Activation Recomputation in Large Transformer Models
Viaarxiv icon

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Add code
Feb 04, 2022
Figure 1 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 2 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 3 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 4 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Viaarxiv icon

Efficient Large-Scale Language Model Training on GPU Clusters

Add code
Apr 09, 2021
Figure 1 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 2 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 3 for Efficient Large-Scale Language Model Training on GPU Clusters
Figure 4 for Efficient Large-Scale Language Model Training on GPU Clusters
Viaarxiv icon

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Add code
Oct 05, 2019
Figure 1 for Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Figure 2 for Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Figure 3 for Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Figure 4 for Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Viaarxiv icon

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Add code
Dec 08, 2015
Figure 1 for Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Figure 2 for Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Figure 3 for Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Figure 4 for Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Viaarxiv icon

Deep Speech: Scaling up end-to-end speech recognition

Add code
Dec 19, 2014
Figure 1 for Deep Speech: Scaling up end-to-end speech recognition
Figure 2 for Deep Speech: Scaling up end-to-end speech recognition
Figure 3 for Deep Speech: Scaling up end-to-end speech recognition
Figure 4 for Deep Speech: Scaling up end-to-end speech recognition
Viaarxiv icon