Picture for Igor Gitman

Igor Gitman

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

Add code
Oct 02, 2024
Viaarxiv icon

Nemotron-4 340B Technical Report

Add code
Jun 17, 2024
Figure 1 for Nemotron-4 340B Technical Report
Figure 2 for Nemotron-4 340B Technical Report
Figure 3 for Nemotron-4 340B Technical Report
Figure 4 for Nemotron-4 340B Technical Report
Viaarxiv icon

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Add code
Feb 15, 2024
Figure 1 for OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Figure 2 for OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Figure 3 for OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Figure 4 for OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Viaarxiv icon

Confidence-based Ensembles of End-to-End Speech Recognition Models

Add code
Jun 27, 2023
Viaarxiv icon

Powerful and Extensible WFST Framework for RNN-Transducer Losses

Add code
Mar 18, 2023
Viaarxiv icon

Understanding the Role of Momentum in Stochastic Gradient Methods

Add code
Oct 30, 2019
Figure 1 for Understanding the Role of Momentum in Stochastic Gradient Methods
Figure 2 for Understanding the Role of Momentum in Stochastic Gradient Methods
Figure 3 for Understanding the Role of Momentum in Stochastic Gradient Methods
Figure 4 for Understanding the Role of Momentum in Stochastic Gradient Methods
Viaarxiv icon

OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models

Add code
May 25, 2018
Figure 1 for OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models
Figure 2 for OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models
Figure 3 for OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models
Figure 4 for OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models
Viaarxiv icon

Novel Prediction Techniques Based on Clusterwise Linear Regression

Add code
Apr 28, 2018
Figure 1 for Novel Prediction Techniques Based on Clusterwise Linear Regression
Figure 2 for Novel Prediction Techniques Based on Clusterwise Linear Regression
Figure 3 for Novel Prediction Techniques Based on Clusterwise Linear Regression
Figure 4 for Novel Prediction Techniques Based on Clusterwise Linear Regression
Viaarxiv icon

Convergence Analysis of Gradient Descent Algorithms with Proportional Updates

Add code
Jan 09, 2018
Figure 1 for Convergence Analysis of Gradient Descent Algorithms with Proportional Updates
Figure 2 for Convergence Analysis of Gradient Descent Algorithms with Proportional Updates
Figure 3 for Convergence Analysis of Gradient Descent Algorithms with Proportional Updates
Viaarxiv icon

Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification

Add code
Oct 07, 2017
Figure 1 for Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
Figure 2 for Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
Figure 3 for Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
Figure 4 for Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
Viaarxiv icon