Picture for Jian Sha

Jian Sha

Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

Add code
Mar 07, 2025
Figure 1 for Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Figure 2 for Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Figure 3 for Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Figure 4 for Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Viaarxiv icon

EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models

Add code
Dec 10, 2024
Figure 1 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Figure 2 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Figure 3 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Figure 4 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Viaarxiv icon

Couler: Unified Machine Learning Workflow Optimization in Cloud

Add code
Mar 12, 2024
Viaarxiv icon

ASPEN: High-Throughput LoRA Fine-Tuning of Large Language Models with a Single GPU

Add code
Dec 05, 2023
Viaarxiv icon