Picture for Xiaoying Jia

Xiaoying Jia

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Add code
Feb 23, 2024
Figure 1 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 2 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 3 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 4 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Viaarxiv icon

ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Add code
Oct 06, 2022
Figure 1 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 2 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 3 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 4 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Viaarxiv icon

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Add code
Aug 29, 2020
Figure 1 for Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Figure 2 for Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Figure 3 for Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Figure 4 for Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Viaarxiv icon