Picture for Shaoduo Gan

Shaoduo Gan

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

Add code
Apr 07, 2024
Viaarxiv icon

Few-shot Named Entity Recognition with Entity-level Prototypical Network Enhanced by Dispersedly Distributed Prototypes

Add code
Aug 17, 2022
Figure 1 for Few-shot Named Entity Recognition with Entity-level Prototypical Network Enhanced by Dispersedly Distributed Prototypes
Figure 2 for Few-shot Named Entity Recognition with Entity-level Prototypical Network Enhanced by Dispersedly Distributed Prototypes
Figure 3 for Few-shot Named Entity Recognition with Entity-level Prototypical Network Enhanced by Dispersedly Distributed Prototypes
Figure 4 for Few-shot Named Entity Recognition with Entity-level Prototypical Network Enhanced by Dispersedly Distributed Prototypes
Viaarxiv icon

Stochastic Gradient Descent without Full Data Shuffle

Add code
Jun 12, 2022
Figure 1 for Stochastic Gradient Descent without Full Data Shuffle
Figure 2 for Stochastic Gradient Descent without Full Data Shuffle
Figure 3 for Stochastic Gradient Descent without Full Data Shuffle
Figure 4 for Stochastic Gradient Descent without Full Data Shuffle
Viaarxiv icon

FRuDA: Framework for Distributed Adversarial Domain Adaptation

Add code
Dec 26, 2021
Figure 1 for FRuDA: Framework for Distributed Adversarial Domain Adaptation
Figure 2 for FRuDA: Framework for Distributed Adversarial Domain Adaptation
Figure 3 for FRuDA: Framework for Distributed Adversarial Domain Adaptation
Figure 4 for FRuDA: Framework for Distributed Adversarial Domain Adaptation
Viaarxiv icon

BAGUA: Scaling up Distributed Learning with System Relaxations

Add code
Jul 12, 2021
Figure 1 for BAGUA: Scaling up Distributed Learning with System Relaxations
Figure 2 for BAGUA: Scaling up Distributed Learning with System Relaxations
Figure 3 for BAGUA: Scaling up Distributed Learning with System Relaxations
Figure 4 for BAGUA: Scaling up Distributed Learning with System Relaxations
Viaarxiv icon

Towards Demystifying Serverless Machine Learning Training

Add code
May 17, 2021
Figure 1 for Towards Demystifying Serverless Machine Learning Training
Figure 2 for Towards Demystifying Serverless Machine Learning Training
Figure 3 for Towards Demystifying Serverless Machine Learning Training
Figure 4 for Towards Demystifying Serverless Machine Learning Training
Viaarxiv icon

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

Add code
Feb 04, 2021
Figure 1 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 2 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 3 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 4 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Viaarxiv icon

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Add code
Aug 28, 2020
Figure 1 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Figure 2 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Figure 3 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Figure 4 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Viaarxiv icon

Communication Compression for Decentralized Training

Add code
Sep 27, 2018
Figure 1 for Communication Compression for Decentralized Training
Figure 2 for Communication Compression for Decentralized Training
Figure 3 for Communication Compression for Decentralized Training
Figure 4 for Communication Compression for Decentralized Training
Viaarxiv icon