Picture for Conglong Li

Conglong Li

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Add code
Jul 04, 2024
Viaarxiv icon

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Add code
Oct 11, 2023
Figure 1 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 2 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 3 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 4 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Viaarxiv icon

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention

Add code
Sep 29, 2023
Figure 1 for DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Figure 2 for DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Figure 3 for DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Figure 4 for DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Viaarxiv icon

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Add code
Aug 02, 2023
Figure 1 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 2 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 3 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 4 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Viaarxiv icon

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Add code
Dec 07, 2022
Figure 1 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 2 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 3 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 4 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Viaarxiv icon

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Add code
Nov 17, 2022
Figure 1 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 2 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 3 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 4 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Add code
Jun 04, 2022
Figure 1 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 2 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 3 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 4 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Viaarxiv icon

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Add code
Jun 04, 2022
Figure 1 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 2 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 3 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 4 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Viaarxiv icon

Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

Add code
Feb 12, 2022
Figure 1 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 2 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 3 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 4 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Viaarxiv icon