Picture for Hao-Jun Michael Shi

Hao-Jun Michael Shi

Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent

Add code
Feb 03, 2026
Viaarxiv icon

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

Add code
Dec 18, 2025
Figure 1 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Figure 2 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Figure 3 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Figure 4 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Viaarxiv icon

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Add code
Sep 12, 2023
Figure 1 for A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Figure 2 for A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Figure 3 for A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Figure 4 for A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Viaarxiv icon

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

Add code
Sep 04, 2019
Figure 1 for Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems
Figure 2 for Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems
Figure 3 for Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems
Figure 4 for Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems
Viaarxiv icon

Deep Learning Recommendation Model for Personalization and Recommendation Systems

Add code
May 31, 2019
Figure 1 for Deep Learning Recommendation Model for Personalization and Recommendation Systems
Figure 2 for Deep Learning Recommendation Model for Personalization and Recommendation Systems
Figure 3 for Deep Learning Recommendation Model for Personalization and Recommendation Systems
Figure 4 for Deep Learning Recommendation Model for Personalization and Recommendation Systems
Viaarxiv icon

A Progressive Batching L-BFGS Method for Machine Learning

Add code
May 30, 2018
Figure 1 for A Progressive Batching L-BFGS Method for Machine Learning
Figure 2 for A Progressive Batching L-BFGS Method for Machine Learning
Figure 3 for A Progressive Batching L-BFGS Method for Machine Learning
Figure 4 for A Progressive Batching L-BFGS Method for Machine Learning
Viaarxiv icon

A Primer on Coordinate Descent Algorithms

Add code
Jan 12, 2017
Figure 1 for A Primer on Coordinate Descent Algorithms
Figure 2 for A Primer on Coordinate Descent Algorithms
Figure 3 for A Primer on Coordinate Descent Algorithms
Figure 4 for A Primer on Coordinate Descent Algorithms
Viaarxiv icon

Practical Algorithms for Learning Near-Isometric Linear Embeddings

Add code
Apr 22, 2016
Figure 1 for Practical Algorithms for Learning Near-Isometric Linear Embeddings
Figure 2 for Practical Algorithms for Learning Near-Isometric Linear Embeddings
Figure 3 for Practical Algorithms for Learning Near-Isometric Linear Embeddings
Figure 4 for Practical Algorithms for Learning Near-Isometric Linear Embeddings
Viaarxiv icon