Picture for Reza Yazdani Aminabadi

Reza Yazdani Aminabadi

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Add code
Jan 09, 2024
Viaarxiv icon

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Add code
Dec 18, 2023
Viaarxiv icon

ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers

Add code
Oct 26, 2023
Viaarxiv icon

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Add code
Aug 02, 2023
Figure 1 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 2 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 3 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 4 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Viaarxiv icon

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

Add code
Jan 27, 2023
Viaarxiv icon

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

Add code
Jun 30, 2022
Figure 1 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 2 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 3 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 4 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Viaarxiv icon

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Add code
Jun 04, 2022
Figure 1 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 2 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 3 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 4 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Viaarxiv icon

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Add code
Feb 04, 2022
Figure 1 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 2 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 3 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 4 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Viaarxiv icon

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

Add code
Jan 14, 2022
Figure 1 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 2 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 3 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 4 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Viaarxiv icon

ZeRO-Offload: Democratizing Billion-Scale Model Training

Add code
Jan 18, 2021
Figure 1 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 2 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 3 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 4 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Viaarxiv icon