Picture for Zhiquan Lai

Zhiquan Lai

Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Add code
Jul 24, 2024
Viaarxiv icon

Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models

Add code
Jun 21, 2022
Figure 1 for Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Figure 2 for Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Figure 3 for Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Figure 4 for Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Viaarxiv icon

DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation

Add code
Mar 30, 2022
Figure 1 for DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
Figure 2 for DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
Figure 3 for DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
Figure 4 for DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
Viaarxiv icon

EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks

Add code
Oct 18, 2021
Figure 1 for EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks
Figure 2 for EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks
Figure 3 for EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks
Figure 4 for EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks
Viaarxiv icon

S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning

Add code
Oct 05, 2021
Figure 1 for S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning
Figure 2 for S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning
Figure 3 for S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning
Figure 4 for S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning
Viaarxiv icon

Hierarchical Adaptive Pooling by Capturing High-order Dependency for Graph Representation Learning

Add code
Apr 13, 2021
Figure 1 for Hierarchical Adaptive Pooling by Capturing High-order Dependency for Graph Representation Learning
Figure 2 for Hierarchical Adaptive Pooling by Capturing High-order Dependency for Graph Representation Learning
Figure 3 for Hierarchical Adaptive Pooling by Capturing High-order Dependency for Graph Representation Learning
Figure 4 for Hierarchical Adaptive Pooling by Capturing High-order Dependency for Graph Representation Learning
Viaarxiv icon

ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach

Add code
Jun 17, 2020
Figure 1 for ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach
Figure 2 for ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach
Figure 3 for ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach
Figure 4 for ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach
Viaarxiv icon