Picture for Dachao Lin

Dachao Lin

LocMoE+: Enhanced Router with Token Feature Awareness for Efficient LLM Pre-Training

Add code
May 24, 2024
Viaarxiv icon

Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis

Add code
Apr 15, 2023
Viaarxiv icon

On the Convergence of Policy in Unregularized Policy Mirror Descent

Add code
May 19, 2022
Figure 1 for On the Convergence of Policy in Unregularized Policy Mirror Descent
Figure 2 for On the Convergence of Policy in Unregularized Policy Mirror Descent
Viaarxiv icon

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer

Add code
Jan 08, 2022
Figure 1 for Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer
Figure 2 for Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer
Viaarxiv icon

Directional Convergence Analysis under Spherically Symmetric Distribution

Add code
May 09, 2021
Figure 1 for Directional Convergence Analysis under Spherically Symmetric Distribution
Figure 2 for Directional Convergence Analysis under Spherically Symmetric Distribution
Figure 3 for Directional Convergence Analysis under Spherically Symmetric Distribution
Viaarxiv icon

Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent

Add code
Apr 12, 2021
Figure 1 for Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent
Figure 2 for Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent
Figure 3 for Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent
Figure 4 for Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent
Viaarxiv icon

Landscape of Sparse Linear Network: A Brief Investigation

Add code
Sep 16, 2020
Figure 1 for Landscape of Sparse Linear Network: A Brief Investigation
Figure 2 for Landscape of Sparse Linear Network: A Brief Investigation
Figure 3 for Landscape of Sparse Linear Network: A Brief Investigation
Figure 4 for Landscape of Sparse Linear Network: A Brief Investigation
Viaarxiv icon

Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond

Add code
Aug 30, 2020
Figure 1 for Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond
Figure 2 for Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond
Figure 3 for Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond
Figure 4 for Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond
Viaarxiv icon

Towards Understanding the Importance of Noise in Training Neural Networks

Add code
Sep 07, 2019
Figure 1 for Towards Understanding the Importance of Noise in Training Neural Networks
Figure 2 for Towards Understanding the Importance of Noise in Training Neural Networks
Figure 3 for Towards Understanding the Importance of Noise in Training Neural Networks
Figure 4 for Towards Understanding the Importance of Noise in Training Neural Networks
Viaarxiv icon

Towards Better Generalization: BP-SVRG in Training Deep Neural Networks

Add code
Aug 18, 2019
Figure 1 for Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Figure 2 for Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Figure 3 for Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Figure 4 for Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Viaarxiv icon