Picture for Yongqiang Guo

Yongqiang Guo

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

Add code
Aug 26, 2024
Figure 1 for Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
Figure 2 for Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
Figure 3 for Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
Figure 4 for Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
Viaarxiv icon

TRANSOM: An Efficient Fault-Tolerant System for Training LLMs

Add code
Oct 18, 2023
Figure 1 for TRANSOM: An Efficient Fault-Tolerant System for Training LLMs
Figure 2 for TRANSOM: An Efficient Fault-Tolerant System for Training LLMs
Figure 3 for TRANSOM: An Efficient Fault-Tolerant System for Training LLMs
Figure 4 for TRANSOM: An Efficient Fault-Tolerant System for Training LLMs
Viaarxiv icon