Picture for Dingwen Tao

Dingwen Tao

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

Add code
Oct 20, 2024
Figure 1 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 2 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 3 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 4 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Viaarxiv icon

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

Add code
Jul 05, 2024
Viaarxiv icon

FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources

Add code
Jul 01, 2024
Viaarxiv icon

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data

Add code
Apr 20, 2024
Viaarxiv icon

Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors

Add code
Sep 29, 2023
Viaarxiv icon

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs

Add code
May 03, 2023
Figure 1 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 2 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 3 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 4 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Viaarxiv icon

HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks

Add code
Jan 20, 2023
Figure 1 for HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks
Figure 2 for HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks
Figure 3 for HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks
Figure 4 for HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks
Viaarxiv icon

SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates

Add code
Nov 04, 2022
Figure 1 for SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates
Figure 2 for SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates
Figure 3 for SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates
Figure 4 for SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates
Viaarxiv icon

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture

Add code
Jun 28, 2022
Figure 1 for H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture
Figure 2 for H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture
Figure 3 for H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture
Figure 4 for H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture
Viaarxiv icon

COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression

Add code
Nov 18, 2021
Figure 1 for COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Figure 2 for COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Figure 3 for COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Figure 4 for COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Viaarxiv icon