Picture for Yanghua Peng

Yanghua Peng

ByteDance

HybridFlow: A Flexible and Efficient RLHF Framework

Add code
Sep 28, 2024
Viaarxiv icon

Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

Add code
Aug 07, 2024
Viaarxiv icon

ByteCheckpoint: A Unified Checkpointing System for LLM Development

Add code
Jul 29, 2024
Viaarxiv icon

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

Add code
Jul 02, 2024
Viaarxiv icon

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization

Add code
Mar 02, 2024
Viaarxiv icon

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Add code
Feb 23, 2024
Figure 1 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 2 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 3 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 4 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Viaarxiv icon

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

Add code
Nov 17, 2023
Viaarxiv icon

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Add code
May 18, 2022
Figure 1 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 2 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 3 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 4 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Viaarxiv icon

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

Add code
Dec 16, 2021
Figure 1 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 2 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 3 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 4 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Viaarxiv icon

DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters

Add code
Sep 13, 2019
Figure 1 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Figure 2 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Figure 3 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Figure 4 for DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Viaarxiv icon