Picture for Chengming Zhang

Chengming Zhang

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

Add code
Oct 20, 2024
Figure 1 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 2 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 3 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 4 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Viaarxiv icon

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Add code
Oct 11, 2023
Figure 1 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 2 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 3 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 4 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Viaarxiv icon

Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors

Add code
Sep 29, 2023
Viaarxiv icon

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Add code
Sep 25, 2023
Viaarxiv icon

PapagAI:Automated Feedback for Reflective Essays

Add code
Jul 10, 2023
Viaarxiv icon

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs

Add code
May 03, 2023
Figure 1 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 2 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 3 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 4 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Viaarxiv icon

HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks

Add code
Jan 20, 2023
Figure 1 for HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks
Figure 2 for HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks
Figure 3 for HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks
Figure 4 for HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks
Viaarxiv icon

SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates

Add code
Nov 04, 2022
Figure 1 for SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates
Figure 2 for SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates
Figure 3 for SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates
Figure 4 for SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates
Viaarxiv icon

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture

Add code
Jun 28, 2022
Figure 1 for H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture
Figure 2 for H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture
Figure 3 for H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture
Figure 4 for H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture
Viaarxiv icon

COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression

Add code
Nov 18, 2021
Figure 1 for COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Figure 2 for COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Figure 3 for COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Figure 4 for COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Viaarxiv icon