Picture for An Xu

An Xu

Distributed Sign Momentum with Local Steps for Training Transformers

Add code
Nov 26, 2024
Viaarxiv icon

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

Add code
Oct 15, 2024
Figure 1 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 2 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 3 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 4 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Viaarxiv icon

Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation

Add code
Mar 18, 2022
Figure 1 for Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation
Figure 2 for Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation
Figure 3 for Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation
Figure 4 for Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation
Viaarxiv icon

Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation

Add code
Mar 12, 2022
Figure 1 for Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation
Figure 2 for Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation
Figure 3 for Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation
Figure 4 for Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation
Viaarxiv icon

Double Momentum SGD for Federated Learning

Add code
Feb 08, 2021
Figure 1 for Double Momentum SGD for Federated Learning
Figure 2 for Double Momentum SGD for Federated Learning
Figure 3 for Double Momentum SGD for Federated Learning
Figure 4 for Double Momentum SGD for Federated Learning
Viaarxiv icon

Privacy-Preserving Asynchronous Federated Learning Algorithms for Multi-Party Vertically Collaborative Learning

Add code
Aug 14, 2020
Figure 1 for Privacy-Preserving Asynchronous Federated Learning Algorithms for Multi-Party Vertically Collaborative Learning
Figure 2 for Privacy-Preserving Asynchronous Federated Learning Algorithms for Multi-Party Vertically Collaborative Learning
Figure 3 for Privacy-Preserving Asynchronous Federated Learning Algorithms for Multi-Party Vertically Collaborative Learning
Figure 4 for Privacy-Preserving Asynchronous Federated Learning Algorithms for Multi-Party Vertically Collaborative Learning
Viaarxiv icon

Training Faster with Compressed Gradient

Add code
Aug 13, 2020
Figure 1 for Training Faster with Compressed Gradient
Figure 2 for Training Faster with Compressed Gradient
Figure 3 for Training Faster with Compressed Gradient
Viaarxiv icon

Exploit Where Optimizer Explores via Residuals

Add code
Apr 11, 2020
Figure 1 for Exploit Where Optimizer Explores via Residuals
Figure 2 for Exploit Where Optimizer Explores via Residuals
Figure 3 for Exploit Where Optimizer Explores via Residuals
Figure 4 for Exploit Where Optimizer Explores via Residuals
Viaarxiv icon

Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

Add code
Feb 25, 2020
Figure 1 for Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training
Figure 2 for Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training
Figure 3 for Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training
Figure 4 for Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training
Viaarxiv icon

Diversely Stale Parameters for Efficient Training of CNNs

Add code
Sep 24, 2019
Figure 1 for Diversely Stale Parameters for Efficient Training of CNNs
Figure 2 for Diversely Stale Parameters for Efficient Training of CNNs
Figure 3 for Diversely Stale Parameters for Efficient Training of CNNs
Figure 4 for Diversely Stale Parameters for Efficient Training of CNNs
Viaarxiv icon