Picture for Masahiro Tanaka

Masahiro Tanaka

GRIN: GRadient-INformed MoE

Add code
Sep 18, 2024
Figure 1 for GRIN: GRadient-INformed MoE
Figure 2 for GRIN: GRadient-INformed MoE
Figure 3 for GRIN: GRadient-INformed MoE
Figure 4 for GRIN: GRadient-INformed MoE
Viaarxiv icon

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer

Add code
Aug 30, 2024
Figure 1 for Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Figure 2 for Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Figure 3 for Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Figure 4 for Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Viaarxiv icon

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Add code
Jul 04, 2024
Figure 1 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Figure 2 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Figure 3 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Figure 4 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Viaarxiv icon

Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training

Add code
Jun 27, 2024
Figure 1 for Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Figure 2 for Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Figure 3 for Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Figure 4 for Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Viaarxiv icon

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Add code
Apr 23, 2024
Figure 1 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 2 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 3 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 4 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Viaarxiv icon

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Add code
Jan 09, 2024
Viaarxiv icon

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Add code
Oct 11, 2023
Figure 1 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 2 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 3 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 4 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Viaarxiv icon

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Add code
Sep 25, 2023
Viaarxiv icon

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Add code
Aug 02, 2023
Figure 1 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 2 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 3 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 4 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Viaarxiv icon

Automatic Graph Partitioning for Very Large-scale Deep Learning

Add code
Mar 30, 2021
Figure 1 for Automatic Graph Partitioning for Very Large-scale Deep Learning
Figure 2 for Automatic Graph Partitioning for Very Large-scale Deep Learning
Figure 3 for Automatic Graph Partitioning for Very Large-scale Deep Learning
Figure 4 for Automatic Graph Partitioning for Very Large-scale Deep Learning
Viaarxiv icon