Picture for Yibo Zhu

Yibo Zhu

ByteDance

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Add code
Feb 18, 2025
Viaarxiv icon

DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

Add code
Feb 11, 2025
Viaarxiv icon

InfinitePOD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers

Add code
Feb 07, 2025
Viaarxiv icon

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

Add code
Jul 02, 2024
Figure 1 for QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Figure 2 for QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Figure 3 for QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Figure 4 for QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Viaarxiv icon

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

Add code
Nov 17, 2023
Viaarxiv icon

ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Add code
Oct 06, 2022
Figure 1 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 2 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 3 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 4 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Viaarxiv icon

ByteComp: Revisiting Gradient Compression in Distributed Training

Add code
Jun 06, 2022
Figure 1 for ByteComp: Revisiting Gradient Compression in Distributed Training
Figure 2 for ByteComp: Revisiting Gradient Compression in Distributed Training
Figure 3 for ByteComp: Revisiting Gradient Compression in Distributed Training
Figure 4 for ByteComp: Revisiting Gradient Compression in Distributed Training
Viaarxiv icon

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Add code
May 18, 2022
Figure 1 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 2 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 3 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Figure 4 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training
Viaarxiv icon

Aryl: An Elastic Cluster Scheduler for Deep Learning

Add code
Feb 16, 2022
Figure 1 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Figure 2 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Figure 3 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Figure 4 for Aryl: An Elastic Cluster Scheduler for Deep Learning
Viaarxiv icon

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

Add code
Dec 16, 2021
Figure 1 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 2 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 3 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Figure 4 for BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Viaarxiv icon