Picture for Kazuki Osawa

Kazuki Osawa

ASDL: A Unified Interface for Gradient Preconditioning in PyTorch

Add code
May 08, 2023
Viaarxiv icon

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices

Add code
Nov 25, 2022
Figure 1 for PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Figure 2 for PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Figure 3 for PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Figure 4 for PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Viaarxiv icon

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

Add code
Oct 06, 2022
Figure 1 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Figure 2 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Figure 3 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Figure 4 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Viaarxiv icon

Neural Graph Databases

Add code
Sep 20, 2022
Figure 1 for Neural Graph Databases
Figure 2 for Neural Graph Databases
Figure 3 for Neural Graph Databases
Figure 4 for Neural Graph Databases
Viaarxiv icon

Efficient Quantized Sparse Matrix Operations on Tensor Cores

Add code
Sep 14, 2022
Figure 1 for Efficient Quantized Sparse Matrix Operations on Tensor Cores
Figure 2 for Efficient Quantized Sparse Matrix Operations on Tensor Cores
Figure 3 for Efficient Quantized Sparse Matrix Operations on Tensor Cores
Figure 4 for Efficient Quantized Sparse Matrix Operations on Tensor Cores
Viaarxiv icon

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Add code
Oct 23, 2020
Figure 1 for Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
Figure 2 for Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
Figure 3 for Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
Viaarxiv icon

Scalable and Practical Natural Gradient for Large-Scale Deep Learning

Add code
Feb 13, 2020
Figure 1 for Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Figure 2 for Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Figure 3 for Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Figure 4 for Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Viaarxiv icon

Practical Deep Learning with Bayesian Principles

Add code
Jun 06, 2019
Figure 1 for Practical Deep Learning with Bayesian Principles
Figure 2 for Practical Deep Learning with Bayesian Principles
Figure 3 for Practical Deep Learning with Bayesian Principles
Figure 4 for Practical Deep Learning with Bayesian Principles
Viaarxiv icon

Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs

Add code
Dec 05, 2018
Figure 1 for Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs
Figure 2 for Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs
Figure 3 for Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs
Figure 4 for Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs
Viaarxiv icon